Skip to main content

Inputs

Data Insights receives log data through inputs, which serve as entry points into the system. Inputs are distinct from streams, which route data, and index sets, which store data.

Input categories

There are two categories of inputs:

  • Listener (Input Profiles) - These inputs wait on a network port for applications to send data to Data Insights. They can use either Transmission Control Protocol (TCP) or User Datagram Protocol (UDP), depending on the input type.

    These inputs can be found under General > Enterprise > Forwarders > Input Profile > select Input profile > Inputs.

    Note

    TCP inputs are more reliable because each message is acknowledged at the network level. UDP inputs provide higher throughput but do not guarantee delivery.

  • Pull (Inputs) - These inputs connect to an endpoint and retrieve log data using an API or other supported methods. They typically require authentication to the device or service they are pulling from.

    These input can be found under General > System > Inputs.

Setting up a new Input

This article contains information on creating an Input, configuring its parameters, and preparing it to receive log data.

Inputs are entry points for incoming data in Data Insights. They define how Data Insights communicates with data sources such as servers, applications, and network devices, enabling the platform to receive log messages.

Prerequisites

Ensure that you have a log source capable of sending data in a format compatible with the selected Data Insights Input type. Refer to the documentation for the specific input type for detailed setup and configuration instructions.

Creating a new input

To create a new input, follow these steps:

  1. Select the System menu on the upper side of the screen.

    Data_Insights_System_Menu_1402376_en.png
  2. Select Inputs from the dropdown menu.

    Data_Insights_System_Menu__Inputs_1402376_en.png

    The Inputs page is displayed.

  3. Select the Input type you want to configure from the Select Input dropdown menu.

    Data_Insights_Inputs_new_menu_1402376_en.png
  4. Select Launch new input.

    Data_Insights_Inputs_Launch_new_Input_1402376_en.png

    The Launch new Input window is displayed.

  5. Fill in all the required information to configure the Input.

  6. Select Launch Input.

The new input is displayed on the Inuputs page. It is initially in a stopped state and no logs are received from the source. To complete configuration and begin data routing, select Setup Input next to the newly created input

Setting up an input

After you create an input, it is displyed in a stopped state and does not receive logs from the source. To complete configuration and enable data routing, select Setup Input next to the newly created input.

To set up the input, follow these steps:

  1. Under the Local Inputs section, select the Set-up Input button for the newly created Input.

    Data_Insights_Inputs_Launch_set_up_Input_1402376_en.png

    Input setup mode begins with a list of Illuminate Processing Packs associated with the selected input.

    These packs contain parsing rules that convert incoming log data into the Common Information Model (GIM) schema, providing normalization and enrichment. For example, selecting one of the Bitdefender Illuminate Packages, such as the GravityZone or Telemetry package, automatically applies the appropriate parsing and enrichment logic for those data sources. In contrast, selecting a GELF HTTP input does not display any Processing or Spotlight Packs, since Illuminate does not provide packs for generic GELF inputs.

    The Input Setup Wizard window is displayed.

  2. Under the Select Illuminate Packs tab, select one of these options:

    • Select the Illuminate Packs you want to use. If a Content or Spotlight pack is already installed, it appears on the list as a non-selectable option.

      Note

      Ingested log data is processed by Illuminate and routed to the corresponding Illuminate stream and index set.

      After selecting the necessary Illuminate Processing Packs, Data Insights shows the available Content and Spotlight Packs associated with the input. These packs install dashboards, Sigma rules, and events that work with logs parsed into the GIM schema.

    • If Illuminate content does not exist or is not available, select Skip Illuminate to configure your data routing preferences.

      You can now select one of two options:

      Tip

      It is recommend to create a new stream for each input to keep log data organized and categorized efficiently.

      • Route data to an existing stream (select Select Stream).

        If you choose an existing stream, its attached configurations will also apply. If you route data to an existing stream, Data Insights attaches a default immutable pipeline called All Messages Routing to the All Messages stream. This pipeline cannot be detached, deleted, renamed, or modified.

      • Create a new stream (select Create Stream).

        During routing configuration, you can also create pipelines and index sets.

        After selecting this option, follow these steps:

        1. Enter a title – Provide a descriptive name for the stream to help identify it later.

        2. Add a description (optional) – Include details about the stream’s purpose or the type of data it will process.

        3. Select Remove matches from ‘Default Stream’ – Enable this option to prevent messages that match this stream from also appearing in the Default Stream. This avoids message duplication.

        4. Select Create a new pipeline for this stream – Check this option to automatically create a dedicated pipeline for the stream. Pipelines define how messages are processed (for example, filtered or enriched).

        5. Create a new index set or select an existing one.

        6. Select Next to continue to the Launch tab.

  3. Select Start Input.

  4. (Optional) Select Launch Input Diagosis.

After finishing the input configuration, the Input Diagnosis page is displayed. This page provides an overview of the input’s current status, message flow, and troubleshooting information. It shows details such as the input title and type, node status, received message counts, traffic metrics, and any message errors. Use this page to verify that the input is running correctly and that messages are being received as expected.

Input types

Inputs are the entry points through which Security Data Lake receives log and event data from various sources. Each input type determines how data is collected and transmitted to the platform.

There are two main categories of inputs:

  • Listener Inputs – These inputs wait for incoming messages from external systems. They open a network port or endpoint and continuously “listen” for data sent by devices, agents, or applications. Listener inputs are commonly used for real-time log streaming over protocols such as TCP, UDP, HTTP, or gRPC (for example, Syslog, GELF, or OpenTelemetry).

  • Pull Inputs – These inputs actively connect to remote services or APIs to retrieve log data at regular intervals. They are typically used to collect data from cloud platforms, security tools, and SaaS applications (for example, AWS CloudTrail, Microsoft 365, or CrowdStrike).

Using the appropriate input type ensures that Security Data Lake can efficiently receive and process messages from both on-premises systems and cloud-based sources. The following table lists all supported inputs, their type, and where each can be configured within the Data Insights environment.

Input

Type

Available under System

Available under Forwarder

AWS CloudTrail

Pull

Yes

No

AWS Kinesis/CloudWatch

Pull

Yes

Yes

AWS S3

Pull

Yes

Yes

AWS Security Lake

Pull

Yes

Yes

Azure Event Hubs

Pull

Yes

No

Beats

Listener

No

Yes

Beats Kafka

Listener (Kafka consumer)

No

Yes

Bitdefender GravityZone

Pull

No

Yes

CEF (CEF AMQP, CEF KAFKA, CEF TCP, CEF UDP)

Listener

No

Yes

Cluster-to-Cluster Forwarder

Forwarder / Output connector (not an Input).

No

Yes

Cloudflare Logpush with Raw HTTP

Listener

No

Yes

CrowdStrike

Pull

Yes

No

GELF (TCP/UDP/HTTP) (GELF AQMP, GELF HTTP, GELF TCP, GELF UDP)

Listener

Yes

Yes

GELF Kafka

Listener (Kafka consumer)

No

Yes

Google Workspace (GCP Log Events)

Pull

Yes

No

IPFIX

Listener

No

Yes

JSON Path value from HTTP API

Pull

Yes

Yes

Microsoft Defender for Endpoint

Pull

Yes

No

Microsoft Graph

Pull

Yes

No

Microsoft Office 365 (Office 365 Log Events )

Pull

No

Yes

Mimecast

Pull

Yes

No

NetFlow (NefFlow UDP?)

Listener

No

Yes

Okta Log Events

Pull

Yes

Yes

OpenTelemetry (gRPC)

Listener

No

Yes

Palo Alto Networks

Pull

No

Yes

Random HTTP message generator

Generator (Test / Synthetic )

Yes

Yes

Raw HTTP (Plaintext AMQP / Plaintext Kafka / Plaintext TCP, Plaintext UDP)

Listener

Yes

Yes

Syslog (AMQP, Kafka, TCP, UDP)

Listener

No

Yes

Salesforce

Pull

Yes

No

Sophos Central

Pull

Yes

No

Symantec SES Events

Pull

Yes

No

Configuring a AWS CloudTrail input

The AWS CloudTrail input allows Data Insights to read log messages from the AWS CloudTrail service. CloudTrail logs are generated by AWS whenever any action takes place within your account. These logs are useful for tracking user activity, API usage, and changes to your AWS resources.

To configure an Configuring a AWS CloudTrail input, follow these steps:

1. Make sure the prerequisites are met

  • A valid AWS account with Amazon CloudTrail enabled.

2. Create a Trail with AWS CloudTrail

  1. Start by configuring trail attributes.

    Create Trail.png
  2. Select the following options:

    • Trail name: Provide a unique name.

    • Enable for all accounts in my organization: Select this check box to enable/disable the trail for all accounts in your organization.

    • Storage location: Create a new S3 bucket or use an existing S3 bucket. Message contents are stored in the bucket.

    • Trail log bucket name: Enter a unique S3 bucket name. This location is where CloudTrail writes the payload of each message. Data Insights reads the message content from here when it receives the SNS message from the queue.

    Additional settings:

    • Log file SSE-KMS encryption: This option is enabled by default. The AWS KMS documentation provides more details.

    • Log file validation: Enable this option to have log digests delivered to your Amazon S3 bucket.

    • SNS notification delivery: Enable.

    • Create a new SNS topic: Specify a name for the topic (for example cloudtrail-log-write) or select one of the existing topics. This name is needed to configure the Data Insights input.

    Enabling Cloudwatch Logs and adding Tags isoptional.

  3. Select what types of events you want to log, for example, management events, data events, or insight events.

  4. Review and complete the set up.

3. Set up SQS for CloudTrail Write Notifications

  1. Go to Amazon SQS and create a queue. All settings can be left at their default values initially.

    Create Queue.png
  2. Specify a queue name (for example cloudtrail-notifications). This name is needed to configure the Data Insights input. CloudTrail writes notifications with S3 file name references to this queue.

  3. Subscribe the SQS queue to your CloudTrail SNS topic.

    SNS Subscription.png

4. Ensure HTTPS Communication

This input uses the AWS SDK to communicate with various AWS resources. Therefore, HTTPS communication must be allowed between the Data Insights server and the resources. If communication on the network segment containing the Data Insights cluster is restricted, ensure that communication to the following endpoints are explicitly permitted.

monitoring.<region>.amazonaws.com
cloudtrail.<region>.amazonaws.com
sqs.<region>.amazonaws.com
sqs-fips.<region>.amazonaws.com
<bucket-name>.s3-<region>.amazonaws.com 

5. Configure the Input in Data Insights

During the configuration, fill in fields based on your preferences:

Fields

Values

Title

Enter a unique name for the input.

AWS SQS Region

Select the AWS region where the SQS queue is located.

AWS S3 Region

Select the AWS region of the S3 bucket that stores CloudTrail logs.

SQS Queue Name

Enter the name of the SQS queue that receives CloudTrail notifications from SNS.

Enable Throttling

Stops reading new data when the system falls behind on message processing, allowing Data Insights to catch up

AWS Access Key (optional)

The identifier for the AWS IAM user. See Credential settings retrieval order documentation for details.

AWS Secret Key (optional)

The secret access key for the IAM user with permissions to access the subscriber and SQS queue.

AWS Assume Role (ARN) (optional)

Use this setting for cross-account access.

Override Source (optional)

Overrides the default source value, which is normally derived from the hostname in the received packet. Enter a custom string to optimize the source field.

Encoding (optional)

Specifies the encoding expected by the input. For example, UTF-8 encoded messages should not be sent to an input configured for UTF-16.

Troubleshooting

If the CloudTrail input starts and debug logs show messages are received but none appear in search, verify that the SQS subscription is not configured to deliver messages in raw format.

Configuring an AWS Kinesis/CloudWatch Input

The AWS Kinesis/CloudWatch Input allows Security Data Lake to read log messages from CloudWatch via Kinesis.

To configure an AWS Kinesis/CloudWatch Input, follow these steps:

1. Make sure the prerequisites are met

Kinesis is required to stream messages to Security Data Lake before messages can be read from CloudWatch.

Important

The following message types are supported:

  • CloudWatch Logs: Raw text strings within CloudWatch.

  • CloudWatch Flow Logs: Flow Logs within a CloudWatch log group.

  • Kinesis Raw Logs: Raw text strings written to Kinesis.

2. Set up the flow

Review to learn how to add the AWS Kinesis/CloudWatch input to Security Data Lake. For this setup to function as expected, the Recommended Policy must be allowed for the authorized user (see Permission Policies below).

  1. Perform the AWS Kinesis Authorize steps:

    1. Add the input name, AWS Access Key, AWS Secret Key, and select AWS Region to authorize Security Data Lake.

    2. Select the Authorize & Choose Stream button to continue.

  2. Perform the AWS Kinesis Setup:

    1. In the dialog box, select the Setup Kinesis Automatically button.

    2. Enter a name for the Kinesis stream and select a CloudWatch log group from the drop-down list.

    3. Select Begin Automated Setup.

      A Kinesis Auto Setup Agreement prompt will appear.

    4. Read the agreement, and click I Agree! Create these AWS resources now.

    The auto-setup details and references the resources that were created.

  3. Click Continue Setup to proceed.

  4. On the AWS CloudWatch Health Check, Security Data Lake reads a message from the Kinesis stream and checks its format. Security Data Lake attempts to automatically parse the message if it is of a known type.

  5. For AWS Kinesis Review, review and finalize the details for the input to complete.

For this setup to function as expected, the Least Privilege Policy shown below must be allowed for the authorized user (see Permission Policies below).

  1. Complete AWS Kinesis Authorize steps as follows:

    1. Type in the input name, AWS Access Key, AWS Secret Key, and select AWS Region to authorize Security Data Lake. Click the Authorize & Choose Stream button to continue.

  2. Complete AWS Kinesis Setup as follows:

    1. Select the Kinesis stream to pull logs.

    2. Click Verify Stream & Format to continue.

  3. On the AWS CloudWatch Health Check, Security Data Lake reads a message from the Kinesis stream and checks its format. Security Data Lake attempts to automatically parse the message if it is of a known type.

  4. For AWS Kinesis Review, review and finalize the details for the input to complete.

Permission Policies

Manual Setup Flow Permissions

You can find the minimum permissions required for the input on this AWS page. The page also includes detailed descriptions of these permissions and an example policy.

Automatic Setup Flow Permissions

The automatic setup requires all permissions from the manual setup plus the additional permissions listed below.

  • iam:CreateRole

  • iam:GetRole

  • iam:PassRole

  • iam:PutRolePolicy

  • kinesis:CreateStream

  • kinesis:DescribeStream

  • kinesis:GetRecords

  • kinesis:GetShardIterator

  • kinesis:ListShards

  • kinesis:ListStreams

  • logs:DescribeLogGroups

  • logs:PutSubscriptionFilter

AWS S3 Input

The AWS S3 input collects log files published to an Amazon S3 bucket. As new files are published, they are ingested automatically. Supported formats include Comma-Separated Values (CSV), Security Data Lake Extended Log Format (GELF), newline-delimited logs (one message per line), and JSON root array messages (multiple log messages in a single JSON array). The input uses Amazon Simple Queue Service (SQS) bucket notifications to detect when new data is available for Security Data Lake to read.

To configure the AWS S3 Input, follow these steps:

1. Make sure the prerequisites are met

  • An Amazon Web Services (AWS) subscription.

  • A defined S3 bucket to which logs may be written.

2. Create an IAM role and assign permissions

For Security Data Lake to connect to AWS S3, you must create an Identity and Access Management (IAM) role with permissions to read the target SQS queue and the S3 bucket. The following Amazon permissions are required for the input to function:

  • s3: GetObject

  • sqs: ReceiveMessage

3. Create and configure an SQS queue

Create an SQS queue that Security Data Lake can subscribe to in order to receive notifications of new files to read. Most default options can be accepted.

Note

An access policy must be defined to allow the S3 bucket to publish notifications to the queue. The following is an sample policy for authorizing S3 to publish the notifications.

The S3 bucket creation is defined here:

{
  "Version": "2012-10-17",
  "Id": "example-ID",
  "Statement": [
    {
      "Sid": "s3-publish-policy",
      "Effect": "Allow",
      "Principal": {
        "Service": "s3.amazonaws.com"
      },
      "Action": "SQS:SendMessage",
      "Resource": "arn:aws:sqs:<region>:<account-number>:<queue-name>",
      "Condition": {
        "StringEquals": {
          "aws:SourceAccount": "<account-number>"
        },
        "ArnLike": {
          "aws:SourceArn": "arn:aws:s3:::<s3-bucket-name>"
        }
      }
    }
  ]
}

Note

For more information on enabling and configuring notifications in the Amazon S3 console, refer to this Amazon KB article.

4. Set Up an S3 Bucket

You need an S3 bucket to store log message files. If one does not exist, create it using this Amazon KB article. After the bucket is created, enable Event Notifications following these steps:

  1. When configuring notifications, in the Event Types section, select All object create events. This ensures the Input is notified regardless of how files are created.

    AWS S3 input-1.png
  2. In the Destination section, select the SQS queue that was created above.

    AWS S3 input-2.png
  3. Test the notification capability between the SQS queue and the S3 bucket after both are set up.

Note

For more information on setting up an S3 bucket, review this Amazon KB article.

3. Configure the Input in Security Data Lake

Configure the Input by entering the following values:

Field

Value

Input Name

A unique name for the input.

AWS Authentication Type

The input supports automatic authentication using the predefined authentication chain in the AWS Software Development Kit (SDK). This option is typically used when an IAM policy is attached to the instance running Security Data Lake. The input follows the authentication methods in the order defined in the AWS documentation.

If the Key & Secret option is selected, the input also supports the ability to enter an AWS API Access Key and Secret Key.

AWS Access Key ID: The access key ID generated for the user with required permission to the S3 bucket and the SQS queue associated with the S3 bucket. These AWS credentials can be configured in Security Data Lake.

SQS Queue name

The name of the queue that will be receiving S3 event notifications.

S3 Bucket

The S3 bucket where log files are written.

S3 Region

The region where the S3 bucket is located.

AWS Assume Role (ARN)

Use this setting to enable cross-account access.

Content Type

The format of the logs present in the S3 bucket. It also supports newline values within the individual CSV fields.

  • CSV: Comma-separated values.

    "field 1", "field 2"
    "same line field", "field with line breaks"
  • Security Data Lake Extended Log Format (GELF)

  • Text (Newline Delimited): One log message per line. The input creates one message in Security Data Lake for each line.

  • JSON Array: Expects a JSON array at the root of the document containing either strings, or individual JSON documents. For example:

    ["log message 1", "log message 2", ...]
    [{"key", "value"}, {"key2", "value2"}, ...]

Compression Type

The compression type of the log files. Use this if log files are written within compressed archives.

Supported options:

  • GZIP

  • None

Polling Interval

Determines how often (in minutes) Security Data Lake checks for new data in the S3 bucket. The smallest allowable interval is 5 minutes.

Specifies how often, in minutes, Security Data Lake checks the S3 bucket for new data. The minimum interval is 5 minutes.

Enable Throttling

If enabled, no new messages are read from this input until Security Data Lake catches up with its message load.

AWS Security Lake Input

Amazon Security Lake is a service that aggregates and manages security logs and event data. This integration ingests security logs from Amazon Security Lake into Security Data Lake. For more information on working with Amazon Security Lake, review this Amazon user guide.

To configure the AWS Security Lake Input, follow these steps:

1. Make sure the prerequisites are met

To use the AWS Security Lake input, you need an AWS account with Amazon Security Lake enabled and a subscriber configured with the appropriate Identity and Access Management (IAM) role. Security Data Lake then polls Security Lake at the configured interval and ingests new logs.

For more information, review the Amazon Security Lake documentation.

2. Set up the Security Lake service

  1. Create an AWS account and an administrative user.

  2. Verify that the AmazonSecurityLakeMetaStoreManager role is present in AWS Identity and Access Management (IAM), and create if necessary.

  3. Assign the AmazonSecurityLakeMetaStoreManager role in AWS Identity and Access Management (IAM) to the user configured for the input.

  4. Create a Subscriber in Amazon Security Lake Console.

  5. In the Logs and events sources page, select which data sources you want to enable for the subscriber. Select between one of these options:

    • All logs and event sources - Gives access to all event and log sources.

    • Specific log and event sources - Gives access to only the selected sources you select.

3. Configure the Security Data Lake Input

Configure the Input by entering the following values:

Field

Value

Input Name

The unique name for the input.

AWS Access Key Id

The Access Key ID for the IAM user with permission to the subscriber and the SQS queue.

AWS Secret Access Key

The unique identifier created for the IAM user.

Security Lake Region

The Security Lake region where the subscriber is created.

SQS Queue Name

The SQS queue name created by the Security Lake subscriber.

Enable Throttling

Enables Security Data Lake to stop reading new data for this input whenever the system falls behind on message processing and needs to catch up.

Store Full Message

Permits Security Data Lake to store the raw log data in the full_message field for each log message.

Warning

Enabling this option may result in a significant increase in the amount of data stored.

Supported logs and event sources

This input currently supports some top-level field parsing of the four event sources below. All other data can be manually parsed from the full_message field:

  • CloudTrail - User activity and API usage in AWS services.

  • VPC flow logs - Details about IP traffic to and from network interfaces in your VPC.

  • Route 53 - DNS queries made by resources within your Amazon Virtual Private Cloud (Amazon VPC).

  • Security Hub findings - Amazon Security findings from the Security Hub.

Azure Event Hubs Input

Azure Event Hub is a fully managed, real-time data ingestion service that supports the ability to receive various types of event logs from various Azure services. The Azure Event Hubs input supports the ability to retrieve event hub events and process them within Security Data Lake.

Azure Event Hubs is a fully managed, real-time data ingestion service for receiving event logs from Azure services. The Azure Event Hubs input retrieves events from an event hub and processes them in Security Data Lake.

To configure a Azure Event Hubs Input, follow these steps:

1. Make sure the prerequisites are met

An active Azure subscription with a configured Event Hub is required to use the Azure Event Hubs input.

For setup instructions, refer to the Azure documentation and features and terminologies.

2. Configure Access for the Input in Azure Event Hub

After Azure Event Hub is configured and receiving log events, follow these steps to configure the Azure Event Hubs Input to connect and read events:

  1. Add a Shared Access Signature (SAS) policy to allow the input to access and communicate with your Event Hub.

    Note

    Consult the Azure documentation for security and management best practices before creating the policy.

  2. To create a policy, follow these steps:

    1. Select Shared access policies from the menu on the right side of the screen.

      shared access policies.png
    2. Select Shared access policies from the left navigation bar in the Event Hub page.

    3. Select the New button at the top to create the policy.

    4. Select the Listen permission. (Security Data Lake will only need to read events from Event Hub).

      add SAS policy.png

After defining the policy, note the primary or secondary connection string. This connection string is required to configure the input in Security Data Lake.

3. Configure a Consumer Group

A consumer group is required for the Azure Event Hubs input to read events from Event Hub. Azure provides a $Default consumer group, which is sufficient for Security Data Lake to ingest logs. If you have created a custom consumer group, you can specify it in the Security Data Lake configuration.

The Security Data Lake Azure Event Hubs input currently only supports running on a single Security Data Lake node, so there is no need to configure a consumer group with additional concurrent readers at this time.

4. Configure the Security Data Lake Input

Configure the Input by entering the following values:

Parameter

Description

Input Name

A unique name for the input.

Azure Event Hub Name

The name of your Event Hub within the Azure console.

Connection String

The primary or secondary connection string as defined in the Shared Access Signature policy above in the configuration.

Consumer Group

The consumer group from which to read events. Use $Default if you have not defined a custom consumer group for your event hub.

Proxy URI

If enabled, this refers to the HTTPS forward proxy URI for Azure communication.

Maximum Batch Size

The maximum batch size to wait for when the input reads Event Hub. The input will block and wait for the specified batch size to be reached before querying the event hub.

Maximum Wait Time

The maximum time to wait for the Maximum Batch Size above to be reached.

Store Full Message

Stores the entire message payload received from Azure Event Hubs.

Proxy Support

The input can be configured to use a forward proxy to relay communication with Azure through a proxy host. Only HTTPS-capable forward proxies are supported.

When proxy support is enabled, the connection to Azure uses port 443 with the AMQP over WebSockets protocol.

Store Full Message

Azure Event Hub can store full messages from Azure log data. This option allows you to manually parse data from all Azure log message types using processing pipelines. To enable it, select Store Full Message in the Azure Event Hub Integrations menu.

Azure Event Hub Event Sources

This input supports parsing and ingesting the following Azure event log types. For instructions on forwarding events from these services to Event Hub, refer to the Azure documentation.

  • Azure Active Directory (audit and sign in logs)

  • Azure Audit

  • Azure Network Watcher

  • Azure Kubernetes Service

  • Azure SQL

Beats Input

Beats are open-source data shippers that run as lightweight agents on your servers, purpose-built to collect and forward specific types of operational and security data. These single-purpose agents are developed primarily by Elastic and the open source community. Each Beat is tailored to a specific use case. Here are some examples:

Beat Name

Purpose

Filebeat

Ships log files (e.g., /var/log/*.log)

Winlogbeat

Ships Windows Event Logs

Metricbeat

Collects system and service metrics

Packetbeat

Captures and analyzes network traffic

Auditbeat

Monitors file integrity and audit logs

Community Beats

Specialized shippers created by the community

The Beats input in Security Data Lake ingests log data directly from Beats shippers and performs basic parsing of the data. In most cases, the Logstash output from Beats can send messages to Security Data Lake without additional configuration. Some Beats may require adjusted settings.

Beats Output: Sending Logs to Security Data Lake

To send data from Beats to Security Data Lake, configure Beats to use the Logstash output plugin over TCP. This is compatible with the Beats input type in Security Data Lake, which implements the same protocol used by Logstash Beats receivers.

Tip

The Security Data Lake Beats input only supports TCP, not UDP. Always configure Beats to use TCP for output.

TLS and Authentication

Security Data Lake’s Beats input supports TLS encryption for secure log transport and can be configured to use client certificates for authentication. Refer to the TLS documentation for setup instructions.Secure Graylog and Beats Input

See also:

Beats Kafka Input

The Beats Kafka input supports collecting logs from Kafka topics. When logs are generated by Beats data shippers and pushed to a Kafka topic, they are automatically ingested and parsed by this input.

Prerequisites

  • Install Beats, Kafka, and Zookeeper.

  • Provide full access permissions to all Kafka and Filebeat folders.

  • Configure the filebeats.yml file as shown below:

    filebeat.inputs: - type: log enabled: true paths: - /var/log/syslog output.kafka: hosts: ["your_kafka_host:9092"] # Replace with your Kafka host(s) topic: 'system_logs' # Name of the Kafka topic codec.json: pretty: false preset: balanced

  • Configure the Kafka server.properties file:

    advertised.listeners=PLAINTEXT://localhost:9092

  • Create a Kafka topic:

    Go to the Kafka directory bin folder and execute the following command:

    ./kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic <Topic name>

Note

Remember to replace localhost with your unique IP address.

Security Data Lake Input Configuration

When launching a new input from the Security Data Lake Inputs tab, the following options are available:

Parameter

Description

Title

Enter a unique name for the input.

Bootstrap Servers

Enter the IP address and port on which the Kafka server is running.

Zookeeper address (optional)

Enter the IP address and port on which the Zookeeper server is running.

Topic filter regex

Enter the topic name filter that is configured in the filebeats.yml file.

Fetch minimum bytes

Enter the minimum byte size a message batch should reach before fetching.

Fetch maximum wait time

Enter the maximum time (in milliseconds) to wait before fetching.

Processor threads

Enter the number of threads to process. This setting is based on the number of partitions available for the topic.

Auto offset reset (optional)

Choose the appropriate selection from the drop-down menu if there is no initial offset in Kafka or if an offset is out of range.

Consumer group identifier (id) (optional)

Enter the name of the consumer group the Kafka input belongs to.

Override source (optional)

Enter the default hostname derived from the received packet. Only set this source if you want to override it with a custom string.

Encoding (optional)

Default encoding is UTF-8. Set this to a standard charset name if you want to override the default.

Custom Kafka properties (optional)

Provide additional properties to Kafka by separating them in a new line.

Bitdefender GravityZone input

The Bitdefender GravityZone input supports collecting logs published from Bitdefender GravityZone through two methods:

  • Event Push service – Sends batches of CEF messages over HTTPS directly to a local forwarder.

  • Security Telemetry (raw events) – Sends syslog-formatted raw events to a local forwarder, which then forwards them to Security Data Lake.Syslog Event Types

Both methods enable the ingestion of Bitdefender endpoint security data into Security Data Lake for monitoring, alerting, and enrichment through the Illuminate Packages.

To configure the Bitdefender GravityZone Input, follow these steps:

1. Make sure the prerequisites are met

For Security Telemetry (Syslog)

  • A valid Bitdefender GravityZone subscription.

  • A running Forwarder connected to your cluster.

  • Firewall rules allowing inbound syslog traffic (default port 514, or a custom port if configured).

  • A configured Security Telemetry policy in GravityZone.

  • Network connectivity between the BEST endpoints and the Forwarder receiving events.

  • TLS is optional but recommended if telemetry data is sent over external or untrusted networks.

For Event Push (CEF over HTTPS)

  • A valid Bitdefender GravityZone subscription.

  • Network connectivity between the GravityZone environment and the local forwarder receiving HTTPS requests.

  • TLS 1.2 or higher must be available for encrypted data transfer.

  • The Authorization Header option must be configured to ensure authenticated message delivery.

  • One of the following configuration options must be implemented:

    • Ensure that the local Forwarder has a public IP address and is configured to accept connections only from the Bitdefender GravityZone cloud.

    • Deploy the Event Push Connector, assign it a public IP address, and ensure it accepts connections only from the Bitdefender GravityZone cloud.

      Then, configure Event Push to send logs to the local Forwarder and configure Event Push to send logs directly to the Event Push Connector.

2. Set up the GravityZone console

Method 1 – Bitdefender Security Telemetry (for raw events)

Bitdefender Security Telemetry provides raw endpoint event data over syslog. These logs can be collected by a Graylog Forwarder configured with a syslog input profile, which receives telemetry events locally and routes them to the Graylog cluster.

  1. Create a syslog input profile

    1. Go to System > Forwarders, then open the Input Profiles tab.

    2. Select New Input Profile and create a new profile.

    3. From the profile page, select Create Input and select Syslog TCP.

    4. Specify a listening port (for example, 1514).

    5. Under Authorization Header Name, enter authorization.

    6. Enter an Authorization Header Value.

    7. Optionally, enable TLS or adjust buffer settings as needed.

    8. Save the input profile and assign it to the desired forwarder.

  2. Configure Security Telemetry in GravityZone

    1. Log in to the the GravityZone console, then go to the Policies page.

    2. Open the policy used on the endpoints where data will be received from.

    3. Go to the Agent > Security Telemetry page.

    4. Enable security telemetry,configure your SIEM connection settings, and enable the type of events you want to track.

      Note

      Enter the IP address of your local forwarder, and the port number entered when creating your syslog input profile (step 1, substep d).

  3. (Optional) Verify event flow

    1. Check the forwarder logs or the Security Data Lake message input stream to confirm that telemetry events are being received.

    2. Use the Bitdefender Telemetry Illuminate Package to parse and normalize the incoming events for analysis.

Method 2 – Bitdefender GravityZone Event Push (CEF in JSON over HTTPS)

The Event Push integration allows GravityZone to send batches of CEF-formatted events encapsulated in JSON POST requests over HTTPS directly to an input. This method is ideal for environments that prefer outbound HTTPS connections over syslog-based delivery.

  1. Enable Push Service

    Set up the GravityZone Push service to send logs to the input.

  2. Set up API access

    Note

    GravityZone Push uses API endpoints for configuration. Authentication is done using a Base64-encoded API Key (followed by a trailing colon).

    Generate the API Key from the My Account section.

  3. Configure Push settings

    Use the setPushEventSettings API request to configure GravityZone to send logs to Security Data Lake. Set the following parameters:

    • serviceTypecef

    • serviceSettings

      • url – The URL where the input listens (for example, https://host:port/bitdefender).

        Note

        The port must always have the 5555 value. The host must be the IP or hostname of the local forwarder.

      • authorization – A password that matches the Authorization Header Value configured in the input (step 1, substep f).

      • requireValidSslCertificatetrue

Exampl API request:

curl -i -X POST https://cloud.gravityzone.bitdefender.com/api/v1.0/jsonrpc/push \
  -H "Authorization: <base64-encoded GravityZone API Key followed by trailing colon>" \
  -H "Content-Type: application/json" \
  -d '{
        "params": {
          "status": 1,
          "serviceType": "cef",
          "serviceSettings": {
            "url": "https://<host:port>/bitdefender",
            "authorization": "<input Authorization Header Value>",
            "requireValidSslCertificate": true
          },
          "subscribeToEventTypes": {
            <include desired event types>
          }
        },
        "jsonrpc": "2.0",
        "method": "setPushEventSettings",
        "id": "d0bcb906-d0b7-4b5f-b29f-b2e8c459a2df"
      }'

After completing the configuration, use the sendTestPushEvent API request to verify that messages are received by Security Data Lake.

3. Configure the Security Data Lake Input

Configure the Input by entering the following values:

Field

Values

Title

A meaningful name used to identify the input. Example: Bitdefender GravityZone – Push Input.

Bind address

The IP address the input listens on. Use 0.0.0.0 to listen on all interfaces or 127.0.0.1 for local-only connections.

Port

The port number the input listens on. Ensure this port is reachable from Bitdefender GravityZone.

Timezone

The timezone of timestamps in incoming CEF messages. Use the local timezone if unsure. Example: +01:00 or America/Chicago.

Receive buffer size (optional)

The buffer size in bytes for network connections. Default: 1048576. Adjust if handling high-volume traffic.

No. of worker threads (optional)

The number of threads used to process network connections. Increase this value for high-throughput environments.

TLS cert file (optional)

Path to the TLS certificate file. Required if enabling TLS for secure HTTPS connections.

TLS private key file (optional)

Path to the TLS private key file associated with the certificate.

Enable TLS

Enables TLS for incoming HTTPS connections. Required when the GravityZone Push service uses HTTPS.

Important

This checkbox must always be selected.

TLS key password (optional)

The password used to decrypt an encrypted private key file, if applicable.

TLS client authentication (optional)

Specifies whether clients must authenticate with a certificate during the TLS handshake.

TLS client auth trusted certs (optional)

File or directory path containing trusted client certificates if mutual TLS authentication is required.

TCP keepalive

Enables TCP keepalive packets to maintain persistent connections. Recommended for long-lived sessions.

Enable bulk receiving

Enables handling of newline-delimited messages in bulk requests. Required for GravityZone Event Push batches.

Important

This checkbox must always be selected.

Enable CORS

Adds CORS headers to HTTP responses for browser-based requests. Typically not required for Event Push.

Max. HTTP chunk size (optional)

The maximum size in bytes of an HTTP request body. Default: 8192. Increase this value if receiving large message batches.

Idle writer timeout (optional)

The time (in seconds) before closing an idle client connection. Use 0 to disable timeout. Default: 60.

Authorization header name

The name of the authorization header used for authentication. Example: authorization.

Authorization header value

The secret value clients must include in the authorization header. Example: Bearer <token>. Must match the value set in the GravityZone Push configuration.

Locale (optional)

Locale used for parsing timestamps in CEF messages. Default: en. Examples: en or en_US.

Use full field names

Enables full field names in CEF messages as defined in the CEF specification. Recommended for compatibility with enrichment packs.

4. Integrate Illuminate packs with the input

When configuring a Forwarder input, select the appropriate Illuminate Processing Pack for the data source. These packs define the parsing and normalization logic that convert incoming log data into the Common Information Model (GIM) schema, enabling enrichment and correlation.

Choose one of the Bitdefender Illuminate Packages—for example, GravityZone or Telemetry—to automatically apply the correct enrichment and mapping rules for those event types.

Note

You can review or update Illuminate Pack assignments later under System > Forwarders > Input Profiles.

CEF Inputs

Common Event Format (CEF) is an extensible, text-based format designed to support multiple device types. CEF defines a syntax for log records comprising a standard header and a variable extension, formatted as key-value pairs.

Most network and security systems support either Syslog or CEF as a means for sending data. Security Data Lake provides the option to ingest CEF messages over UDP, TCP, or Kafka and AMQP as a queuing system.

CEF TCP

To launch a new CEF TCP input:

  1. Navigate to System > Inputs.

  2. Select CEF TCP from the input options and click the Launch new input button.

    CEF TCP.png
  3. Enter your configuration parameters in the pop-up configuration form.

Configuration Parameters

  • Title

    • Assign a title to the input. Example: “CEF TCP Input for XYZ Source”.

  • Bind address

    • Enter an IP address for this input to listen on. The source system/data sends logs to this IP/input.

  • Port

    • Enter a port to use in conjunction with the IP address.

  • Timezone

    • Select the timestamp configured on the system that is sending CEF messages. If the sender does not include the timezone information, you can configure the timezone applied to the messages on arrival. That configuration does not overwrite the timezone included in the timestamp; however, it is the assumed timezone for messages that do not include timezone information.

  • Receive Buffer Size (optional)

    • Depending on the amount of traffic being ingested by the input, this value should be large enough to ensure proper flow of data but small enough to prevent the system from spending resources trying to process the buffered data.

  • No. of worker threads

    • This setting controls how many concurrent threads are used to process incoming data. Increasing the number of threads can enhance data processing speed, resulting in improved throughput. The ideal number of threads to configure depends on the available CPU cores on your Security Data Lake server. A common starting point is to align the number of worker threads with the number of CPU cores. However, it is crucial to strike a balance with other server demands.

Note

The TLS-related settings that follow ensure that only valid sources can send messages to the input securely.

  • TLS cert file (optional)

    • The certificate file that is stored on a Security Data Lake system. The value of this field is a path (/path/to/file) that Security Data Lake should have access to.

  • TLS private key file (optional)

    • The certificate private key file that is stored on a Security Data Lake system. The value of this field is a path (/path/to/file) that Security Data Lake should have access to.

  • Enable TLS

    • Select this option if this input should use TLS.

  • TLS key password (optional)

    • The private key password.

  • TLS client authentication (optional)

    • If you want to require authentication, set this value to optional or required.

  • TLS Client Auth Trusted Certs (optional)

    • The path where client (source) certificates are located on a Security Data Lake system. The value of this field is a path (/path/to/file) that Security Data Lake should have access to.

  • TCP keepalive

    • Enable this option if you want the input to support TCP keep-alive packets to prevent idle connections.

  • Null frame delimeter

    • This option is typically left unchecked. New line is the delimiter for each message.

  • Maximum message size (optional)

    • The maximum message size of the message. The default value should suffice but can be modified depending on message length. Each input type usually has specifications that note the maximum length of a message.

  • Locale (optional)

    • This setting is used to determine the language of the message.

  • Use full field name

    • The CEF key name is usually used as the field name. Select this option if the full field name should be used.

CEF UDP

To launch a new CEF UDP input:

  1. Navigate to System > Inputs.

  2. SelectCEF UDP from the input options and click the Launch new input button.

    CEF UDP.png
  3. Enter your configuration parameters in the pop-up configuration form.

Configuration Parameters

  • Title

    • Assign a title to the input. Example: “CEF UDP Input for XYZ Source”.

  • Bind address

    • Enter an IP address for this input to listen on. The source system/data sends logs to this IP/input.

  • Port

    • Enter a port to use in conjunction with the IP address.

  • Timezone

    • Select the timestamp configured on the system that is sending CEF messages. If the sender does not include the timezone information, you can configure the timezone applied to the messages on arrival. That configuration does not overwrite the timezone included in the timestamp; however, it is the assumed timezone for messages that do not include timezone information.

  • Receive Buffer Size (optional)

    • Depending on the amount of traffic being ingested by the input, this value should be large enough to ensure proper flow of data but small enough to prevent the system from spending resources trying to process the buffered data.

  • No. of worker threads

    • This setting controls how many concurrent threads are used to process incoming data. Increasing the number of threads can enhance data processing speed, resulting in improved throughput. The ideal number of threads to configure depends on the available CPU cores on your Security Data Lake server. A common starting point is to align the number of worker threads with the number of CPU cores. However, it is crucial to strike a balance with other server demands.

  • Locale (optional)

    • This setting is used to determine the language of the message.

Cloudflare Logpush with Raw HTTP Input

Logs from the Cloudflare Logpush service (via HTTP destination) can be ingested into Security Data Lake using the Raw HTTP input. When set up and configured, Logpush will post newline-delimited batches of log messages to the input over HTTP protocol.

General information about this input, including configuration options, may be found in the Raw HTTP Input documentation.

Note

Note that you may review an additional use case for the Raw HTTP input in GitLab Audit Event Streaming with Raw HTTP Input.

Prerequisites

Before proceeding, ensure that the following prerequisites are met:

  • A Cloudflare subscription is required.

  • The Cloudflare Logpush HTTP destination service must be able to forward to an endpoint in your environment that is secured with TLS. See Secure Inputs with TLS for more information. (Note that you may also choose to route through a firewall or gateway to fulfill TLS requirement).Secure Inputs with TLS

  • We strongly recommend using the Authorization Header option when setting up the Raw HTTP input to ensure message requests are authenticated.

Set up the Input

Navigate to System > Inputs and select Raw HTTP to launch the new input. The following configuration settings must be carefully considered when setting up this input for Cloudflare Logpush:

  • Bind Address and Port: Ensure that Cloudflare can route through your network to the IP address and port specified. Note that the Raw HTTP input listens for HTTP requests at the /raw root HTTP path.

  • TLS Settings: TLS must either be enabled for this endpoint, or you can choose to route through a firewall or gateway to fulfill the required usage of TLS.

  • Enable Bulk Receiving: Be sure to select this option. This will ensure that the input will correctly split newline-delimited batches of log messages sent from Cloudflare.

  • Authorization Header: Specify a name and value for the authorization header to use. This will ensure that the input will only accept communication where appropriate authentication is validated.

    • Authorization Header Name: authorization

    • Authorization Header Value: Choose a secure password with sufficient length and complexity to meet your requirements. Use the same value for the authorization setting in Cloudflare.

For the additional configuration settings available, see the Raw HTTP Input documentation for more details. Unless required for your environment, we recommend you use the default settings when determining these additional configuration properties.

Enable the HTTP Destination in Cloudflare

After setting up the new input, you must enable the Logpush service to send logs to Security Data Lake. This is done by defining the Security Data Lake endpoint as a Logpush destination. For information on this process, see the Cloudflare documentation.

Note

Note that the first few steps described in the Cloudflare documentationdirect you to select the appropriate website (i.e. domain) you want to use with Logpush. This can be done by selecting Websites from the Cloudflare management console navigation bar and clicking Add a domain. This step is essential to getting your Cloudflare logs into Security Data Lake!.

When you are prompted to enter the URI where the Raw HTTP input is listening for requests, ensure the URL includes the /raw root path. For example:

https://graylog-host:port/raw?header_Authorization=<Graylog input Authorization Header Value value>
Logpush_destination.png

CrowdStrike Input

This input retrieves data from the CrowdStrike API and ingests it into Security Data Lake for analysis of security events.

To configure the CrowdStrike Input, follow these steps:

1. Make sure the prerequisites are met

To allow Security Data Lake to pull data from CrowdStrike, create an API client in the CrowdStrike Falcon UI with the required scopes.

For more information, refer to the CrowdStrike documentation.

Note

You must have the Falcon Administrator role to view, create, or modify API clients or keys. Secrets are displayed only when a new API client is created or when it is reset.

2. Configure CrowdStrike

Follow thse steps to define a CrowdStrike API client:

  1. Log into the Falcon UI.

  2. From the menu on the left side of the screen, select Support and resources, then API clients and keys.

    Falcon UI.png
  3. Select Add new API Client.

    1. Enter a client name and description for the new API client.

    2. In the API Scopes section, grant read permissions for Alerts and Event Streams by selecting the Read check box for each one.

      API Alerts Scope.PNG
      API Event Streams Scope.PNG
    3. Select Save.

      A Client ID and Client Secrets are displayed

      Note

      The client secret is shown only once and must be stored securely. If it is lost, you must reset it, and any application using the client secret must be updated with the new credentials.

3. Configure the Input in Security Data Lake

Configure the Input by entering the following values:

Field

Value

Input Name

A unique name for the input.

CrowdStrike Client ID

The Client ID obtained during the CrowdStrike configuration.

Client Secret

The Client secret obtained from the CrowdStrike configuration.

User Region

The CrowdStrike User Account Region.

Store Full Message

Permits Security Data Lake to store the raw log data in the full_message field for each log message.

Warning

Enabling this option may result in a significant increase in the amount of data stored.

Checkpoint Interval

Specifies how often, in seconds, Security Data Lake records checkpoints for CrowdStrike data streams. The default value is 30 seconds.

GELF

The Security Data Lake Extended Log Format (GELF) is a log format that avoids the shortcomings of classic plain syslog:

  • Limited to length of 1024 bytes. Inadequate space for payloads like backtraces.

  • No data types in structured syslog. Numbers and strings are indistinguishable.

  • The RFCs are strict enough, but there are so many syslog dialects out there that you cannot possibly parse all of them.

  • No compression.

Syslog is sufficient for logging system messages of machines or network gear, while GELF is a strong choice for logging from within applications. There are libraries and appenders for many programming languages and logging frameworks, so it is easy to implement. GELF can send every exception as a log message to your Security Data Lake cluster without complications from timeouts, connection problems, or anything that may break your application from within your logging class because GELF can be sent via UDP.

GELF via UDP

Chunking

UDP datagrams are limited to a size of 65536 bytes. Some Security Data Lake components are limited to processing up to 8192 bytes. Substantial compressed information fits within the size limit, but you may have more information to send; this is why Security Data Lake supports chunked GELF.

You can define chunks of messages by prepending a byte header to a GELF message, including a message ID and sequence number to reassemble the message later. Most GELF libraries support chunking transparently and will detect if a message is too big to be sent in one datagram.

TCP would solve this problem on a transport layer, but it has other problems that are even harder to tackle: slow connections, timeouts, and other network problems.

Messages can be lost with UDP, and TCP can dismantle the whole application when not designed carefully.

Of course, especially in high-volume environments, TCP is sensible. Many GELF libraries support both TCP and UDP as transport, and some also support https.

Prepend the following structure to your GELF message to make it chunked:

  • Chunked GELF magic bytes - 2 bytes: 0x1e 0x0f

  • Message ID - 8 bytes: Must be the same for every chunk of this message. Identifies the whole message and is used to reassemble the chunks later. Generate from millisecond timestamp + hostname, for example.

  • Sequence number - 1 byte: The sequence number of this chunk starts at 0 and is always less than the sequence count.

  • Sequence count - 1 byte: Total number of chunks this message has.

All chunks MUST arrive within 5 seconds or the server will discard all chunks that have arrived or are in the process of arriving. A message MUST NOT consist of more than 128 chunks.

Note

Please note that the UDP-Inputs of Security Data Lake use the SO_REUSEPORT socket option, which was introduced in Linux kernel version 3.9. So be aware that UDP inputs will NOT work on Linux kernel versions before 3.9.

Compression

When using UDP as transport layer, GELF messages can be sent uncompressed or compressed with either GZIP or ZLIB.

Security Data Lake nodes automatically detect the compression type in the GELF magic byte header.

Decide if you want to trade a bit more CPU load for saving network bandwidth. GZIP is the protocol default.

GELF via TCP

At the current time, GELF TCP only supports uncompressed and non-chunked payloads. Each message needs to be delimited with a null byte (\0) when sent in the same TCP connection.

Warning

GELF TCP does not support compression due to the use of the null byte (\0) as frame delimiter.

GELF Payload Specification

Version 1.1 (11/2013)

A GELF message is a JSON string with the following fields:

  • version string (string (UTF-8)

    • GELF spec version – “1.1”; MUST be set by the client library.

  • host string (UTF-8)

    • the name of the host, source or application that sent this message; MUST be set by the client library.

  • short_message string (UTF-8)

    • a short, descriptive message; MUST be set by the client library.

  • full_message string (UTF-8)

    • a long message that can contain a backtrace; optional.

  • timestamp number

    • seconds since UNIX epoch with optional decimal places for milliseconds; SHOULD be set by the client library. If absent, the timestamp will be set to the current time (now).

  • level number

    • the level equal to the standard syslog levels; optional. Default is 1 (ALERT).

  • facility string (UTF-8)

    • optional, deprecated. Send as additional field instead.

  • line number

    • the line in a file that caused the error (decimal); optional, deprecated. Send as an additional field instead.

  • filestring (UTF-8) an

    • the file (with path, if you want) that caused the error (string); optional, deprecated. Send as an additional field instead.

  • _[additional field] string (UTF-8) or number

    • every field you send and prefix with an underscore ( _) will be treated as an additional field. Allowed characters in field names are any word character (letter, number, underscore), dashes and dots. The , verifying regular expression is: ^[\\w\\.\\-]*$. Libraries SHOULD not allow to send id as additional field ( _id). Security Data Lake server nodes omit this field automatically.

Example Payload

This is an example GELF message payload. Any Security Data Lake-server node accepts and stores this as a message when GZIP/ZLIB is compressed or even when sent uncompressed over a plain socket without new lines.

Note

New lines must be denoted with the \n escape sequence to ensure the payload is valid JSON as per RFC 7159.

{
  "version": "1.1",
  "host": "example.org",
  "short_message": "A short message that helps you identify what is going on",
  "full_message": "Backtrace here\n\nmore stuff",
  "timestamp": 1385053862.3072,
  "level": 1,
  "_user_id": 9001,
  "_some_info": "foo",
  "_some_env_var": "bar"
}

Note

Currently, the server implementation of GELF in Security Data Lake does not support boolean values. Boolean values will be dropped on ingest (for reference).

Sending GELF Messages via UDP Using Netcat

Sending an example message to a GELF UDP input (running on host datainsights.example.com on port 12201):

echo -n '{ "version": "1.1", "host": "example.org", "short_message": "A short message", "level": 5, "_some_info": "foo" }' | nc -w0 -u datainsights.example.com 12201
Sending GELF Messages via TCP Using Netcat

Sending an example message to a GELF TCP input (running on host datainsights.example.com on port 12201):

echo -n -e '{ "version": "1.1", "host": "example.org", "short_message": "A short message", "level": 5, "_some_info": "foo" }'"\0" | nc -w0 datainsights.example.com 12201
Sending GELF Messages Using Curl

Sending an example message to a GELF input (running on https://datainsights.example.com:12201/gelf):

curl -X POST -H 'Content-Type: application/json' -d '{ "version": "1.1", "host": "example.org", "short_message": "A short message", "level": 5, "_some_info": "foo" }' 'http://datainsights.example.com:12201/gelf'

GELF Inputs

The Security Data Lake Extended Log Format (GELF) is a log format that avoids the shortcomings of classic plain Syslog and is perfect for logging from your application layer. It comes with optional compression, chunking, and, most importantly, a clearly defined structure. The Input of GELF messages can be UDP, TCP, or HTTP. Additionally, a queue is possible.

Some applications like Docker can send GELF messages native. Also, fluentd speaks GELF.

There are dozens of GELF libraries for many frameworks and programming languages to get you started. Read more about GELF in the specification.

Note

This input listens for HTTP posts on the /gelf path.

GELF HTTP

You can send in all GELF types via HTTP, including uncompressed GELF, which is simply a plain JSON string. This input supports the configuration of authorization headers, adding password-like protection. When configured, any client making a request must provide the correct authorization header name and value with each request for it to be accepted.

After launching your new input, configure the following fields based on your preferences:Inputs

  • Global

    • Select this check box to enable the input on all Security Data Lake nodes, or keep it unchecked to enable the input on a specific node.

  • Title

    • Assign a unique title to the input. Example: “GELF TCP Input for XYZ Source”.

  • Bind address

    • Enter an IP address on which this input listens. The source system/data sends logs to this IP/input.

  • Port

    • Enter a port to use in conjunction with the IP address.

  • Receive Buffer Size (optional)

    • Depending on the amount of traffic being ingested by the input, this value should be large enough to ensure proper flow of data but small enough to prevent the system from spending resources trying to process the buffered data.

  • No. of worker threads (optional)

    • This setting controls how many concurrent threads are used to process incoming data. Increasing the number of threads can enhance data processing speed, resulting in improved throughput. The ideal number of threads to configure depends on the available CPU cores on your Security Data Lake server. A common starting point is to align the number of worker threads with the number of CPU cores. However, it is crucial to strike a balance with other server demands.

Note

The TLS-related settings that follow ensure that only valid sources can send messages to the input securely.

  • TLS cert file (optional)

    • The certificate file that is stored on a Security Data Lake system. The value of this field is a path (/path/to/file) that Security Data Lake should have access to.

  • TLS private key file (optional)

    • The certificate private key file that is stored on a Security Data Lake system. The value of this field is a path (/path/to/file) that Security Data Lake should have access to.

  • Enable TLS

    • Select if this input should use TLS.

  • TLS key password (optional)

    • The private key password.

  • TLS client authentication (optional)

    • If you want to require the source of the messages sending logs to this input to authenticate themselves, set to optional or required.

  • TLS Client Auth Trusted Certs (optional)

    • The path where the client (source) certificates are located on a Security Data Lake system. The value of this field is a path (/path/to/file) which Security Data Lake should have access to.

  • TCP keepalive

    • Enable this option if you want the input to support TCP keep-alive packets to prevent idle connections.

  • Enable Bulk Receiving

    • Enable this option to receive bulk message separated by newlines (\n or \r\n).

  • Enable CORS

    • Enable Cross-Origin Resource Sharing (CORS) to configure your server to send specific headers in the HTTP response that instruct the browser to allow cross-origin requests.

  • Max. HTTP chunk size (optional)

    • For large data, it is common practice to chunk smaller blocks (e.g., 8KB or 64KB chunks) to prevent overwhelming buffers. The maximum HTTP chunk size is 65536 bytes.

  • Idle writer timeout (optional)

    • The maximum amount of time the server will wait for a client to send data when writing to an output stream before closing the connection due to inactivity.

  • Authorization Header Name (optional)

    • Specify a custom authorization header name to optionally enforce authentication for all received messages. This is a way to add password-like security for this input.

  • Authorization Header Value (optional)

    • Specify authorization header value to optionally enforce authentication for all received messages.

  • Encoding (optional)

    • All messages need to support the encoding configured for the input. For example, UTF-8 encoded messages should not be sent to an input configured to support UTF-16.

  • Override source (optional)

    • By default, messages parse the source field as the provided hostname in the log message. However, if you want to override this setting for devices that output non-standard or unconfigurable hostnames, you can set an alternate source name here.

  • Decompressed size limit

    • The maximum size of the message after being decompressed.

After launching a GELF HTTP input you can use the following endpoints to send messages:

http://graylog.example.org:[port]/gelf (POST)

Try sending an example message using curl:

curl -XPOST http://graylog.example.org:12202/gelf -p0 -d '{"short_message":"Hello there", "host":"example.org", "facility":"test", "_foo":"bar"}'

Both keep-alive and compression are supported via the common HTTP headers. The server will return a 202 Accepted when the message is accepted for processing.

Enable Bulk Receiving Option for HTTP GELF Input

Security Data Lake provides users with an option to enable bulk receiving of messages via HTTP GELF input, which allows bulk receiving of messages separated by newline characters.

The input will automatically separate multiple GELF messages, which are newline-delimited (\n or \r\n) when this option is enabled.

Example curl request:

curl -XPOST -v http://127.0.0.1:12202/gelf -p0 \
-d $'{"short_message":"Bulk message 1", "host":"example.org", "facility":"test", "_foo":"bar"}\r\n\
{"short_message":"Bulk message 2", "host":"example.org", "facility":"test", "_foo":"bar"}\r\n\
{"short_message":"Bulk message 3", "host":"example.org", "facility":"test", "_foo":"bar"}\r\n\
{"short_message":"Bulk message 4", "host":"example.org", "facility":"test", "_foo":"bar"}\r\n\
{"short_message":"Bulk message 5", "host":"example.org", "facility":"test", "_foo":"bar"}'

Note

Existing support for Transfer-Encoding: chunked is present in the HTTP GELF input, and that support now extends to the new Bulk Receiving feature (when the new Enable Bulk Receivingconfig option is turned on.

Warning

Individual GELF messages must be formatted as a valid JSON (containing no line breaks within). Attempts to post formatted JSON to this input will result in an error.

GELF TCP

After launching your new input, configure the following fields based on your preferences:Inputs

  • Title

    • Assign a unique title to the input. Example: “GELF TCP Input for XYZ Source”.

  • Bind address

    • Enter an IP address on which this input listens. The source system/data sends logs to this IP/input.

  • Port

    • Enter a port to use in conjunction with the IP address.

  • Receive Buffer Size (optional)

    • Depending on the amount of traffic being ingested by the input, this value should be large enough to ensure proper flow of data but small enough to prevent the system from spending resources trying to process the buffered data.

  • No. of worker threads

    • This setting controls how many concurrent threads are used to process incoming data. Increasing the number of threads can enhance data processing speed, resulting in improved throughput. The ideal number of threads to configure depends on the available CPU cores on your Security Data Lake server. A common starting point is to align the number of worker threads with the number of CPU cores. However, it is crucial to strike a balance with other server demands.

Note

The TLS-related settings that follow ensure that only valid sources can send messages to the input securely.

  • TLS cert file (optional)

    • The certificate file that is stored on a Security Data Lake system. The value of this field is a path (/path/to/file) that Security Data Lake should have access to.

  • TLS private key file (optional)

    • The certificate private key file that is stored on a Security Data Lake system. The value of this field is a path (/path/to/file) that Security Data Lake should have access to.

  • Enable TLS

    • Select if this input should use TLS.

  • TLS key password (optional)

    • The private key password.

  • TLS client authentication (optional)

    • If you want to require the source of the messages sending logs to this input to authenticate themselves, set to optional or required.

  • TLS Client Auth Trusted Certs (optional)

    • The path where the client (source) certificates are located on a Security Data Lake system. The value of this field is a path (/path/to/file) which Security Data Lake should have access to.

  • TCP keepalive

    • Enable this option if you want the input to support TCP keep-alive packets to prevent idle connections.

  • Null frame delimeter

    • This option is typically left unchecked. New line is the delimiter for each message.

  • Maximum message size

    • The maximum message size of the message. A default value should suffice but can be modified depending on message length. Each input type usually has specifications that note the maximum length of a message.

  • Override Source

    • By default, messages parse the source field as the provided hostname in the log message. However, if you want to override this setting for devices that output non-standard or unconfigurable hostnames, you can set an alternate source name here.

  • Encoding

    • All messages would need to support the encoding configured for the input. UTF-8 encoded messages shouldn’t be sent to an input configured to support UTF-16.

  • Decompressed size limit

    • The maximum size of the message after being decompressed.

GELF UDP

After launching your new input, configure the following fields based on your preferences:Inputs

  • Global

    • Select this check box to enable the input on all Security Data Lake nodes, or keep it unchecked to enable the input on a specific node.

  • Title

    • Assign a unique title to the input. Example: “GELF UDP Input for XYZ Source”

  • Bind address

    • Enter an IP address that this input will listen on. The source system/data sends logs to this IP/input.

  • Port

    • Enter a port to use in conjunction with the IP address.

  • Receive Buffer Size (optional)

    • Depending on the amount of traffic being ingested by the input, this value should be large enough to ensure proper flow of data but small enough to prevent the system from spending resources trying to process the buffered data.

  • No. of worker threads

    • This setting controls how many concurrent threads are used to process incoming data. Increasing the number of threads can enhance data processing speed, resulting in improved throughput. The ideal number of threads to configure depends on the available CPU cores on your Security Data Lake server. A common starting point is to align the number of worker threads with the number of CPU cores. However, it is crucial to strike a balance with other server demands.

  • Override Source

    • By default, messages parse the source field as the provided hostname in the log message. However, if you want to override this setting for devices that output non-standard or unconfigurable hostnames, you can set an alternate source name here.

  • Encoding

    • All messages would need to support the encoding configured for the input. UTF-8 encoded messages shouldn’t be sent to an input configured to support UTF-16.

  • Decompressed size limit

    • The maximum size of the message after being decompressed.

GELF Kafka Input

The GELF Kafka input supports collecting logs from Kafka topics with the help of Filebeats. Once logs are generated by the system and pushed to a Kafka topic, they are automatically ingested by this input.

Prerequisites
  • Install Beats, Kafka, and Zookeeper.

  • Provide full access permissions to all Kafka and Filebeats folders.

  • Configure the filebeats.yml file as shown below:

    Note

    Remember to replace localhost with your unique IP address.

    Beats Kafka.png
  • Configure the Kafka server.properties file advertised.listeners=PLAINTEXT://localhost:9092.

  • Create a Kafka topic.

    • Go to the Kafka directory bin folder and execute the following command:

      ./kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic <Topic name>

Create GELF Kafka Input

To launch a new GELF Kafka input:

  1. Navigate to the System >Inputs.

  2. Select GELF Kafka from the input options and click the Launch new input button.

    GELF Kafka.png
  3. Enter your configuration parameters in the pop-up configuration form.

Configuration Parameters
  • Title

    • Assign a title to the input. Example: “GELF Kafka Input for XYZ Source”.

  • Bootstrap Servers (optional)

    • Enter the IP Address and port on which the Kafka server is running.

  • ZooKeeper address (legacy mode only) (optional)

    • Enter the IP Address and port on which the Kafka server is running.

  • Topic filter regex

    • Enter the topic name filter which is configured in the filebeats.yml file.

  • Fetch minimum bytes

    • Enter the minimum byte size a message batch should reach before fetching.

  • Fetch maximum wait time (ms)

    • Enter the maximum time (in milliseconds) to wait before fetching.

  • Processor threads

    • Enter the number of threads to process. This is based on the number of partitions available for the topic.

  • Allow throttling this input

    • If enabled, no new message is read from this input until Security Data Lake catches up with its message load. This configuration parameter is typically useful for inputs reading from files or message queue systems like AMQP or Kafka. If you regularly poll an external system, e.g. via HTTP, you should leave this option disabled.

  • Auto offset reset (optional)

    • Choose the appropriate selection from the drop down menu if there is no initial offset in Kafka or if an offset is out of range.

  • Consumer group id (optional)

    • Enter the name of the consumer group the Kafka input belongs to.

  • Override source (optional)

    • Enter the default hostname derived from the received packet. Only set this source if you want to override it with a custom string.

  • Encoding (optional)

    • Default encoding is UTF-8. Set this to a standard charset name if you want to override the default. All messages would need to support the encoding configured for the input. UTF-8 encoded messages shouldn’t be sent to an input configured to support UTF-16.

  • Decompressed size limit

    • The maximum size of the message after being decompressed.

  • Custom Kafka properties (optional)

    • Provide additional properties to Kafka by separating them in a new line.

Google Workspace Input

The Google Workspace input collects logs from Google BigQuery using the Google Workspace logs and reports export feature. When users perform actions in services such as Docs, Gmail, or Chat, the related log entries are pushed to BigQuery and ingested automatically. After ingestion, the input deletes the consumed logs.

To configure the Google Workspace Input, follow these steps:

1. Make sure the prerequisites are met

Make sure you have an active Google Workspace and Google Cloud subscription., and Install Graylog Illuminate Google Workspace content pack. //linkul e broken, la ce se refera ca trebuie instalat?

2. Configure Google Cloud

Follow these steps to configure the Google Cloud environment for integration with the Input:

  1. Select an existing Google Cloud project or create a new one. Choose a new or existing project within Google Cloud.

  2. Make sure that Cloud Billing is enabled.

  3. Create a new Service Account for the input.

  4. Grant the account the BigQuery Editor role.

  5. Create a key for the service account and export it in JSON format.

    Note

    This key is needed to authorize the input to interact with BigQuery.

  6. Create a new BigQuery dataset.

    Note

    Log messages will later be exported here.

3. Configure Google Workspace

After completing the Google Cloud configuration, sign in to the Google Workspace Admin console and enable the BigQuery export option.

For more information, refer to the Google documentation.

4. Configure the Security Data Lake Input Configuration

Configure the input by entering the following values:

Fields

Values

Input Name

A unique name for the input

Service Account Key

The key JSON file exported during the Google Cloud setup.

Note

This key is required to authorize the input to connect to BigQuery.

BigQuery Dataset Name

The dataset name configured in BigQuery.

Log Types to Collect

Select the desired Google Workspace log types.

Polling Interval

Determines how often (in minutes) Security Data Lake checks for new data in Big Query tables.

Advanced options

Enable Throttling

If enabled, no new messages are read from this input until Security Data Lake catches up with its message load.

Page size

Provide the maximum number of logs to return per page of query results. The default setting is 1000.

Lag time offset

Provide the lag time in hours as there is an initial delay in the logs for populating the activity data to BigQuery tables.

Store Full Message

Stores the full JSON workspace log message in the full_message field.

Warning

Enabling this option may result in a significant increase in the amount of data stored.

IPFIX Input

IPFIX input allows Security Data Lake to read IPFIX logs. The input supports all of the standard IANA fields by default.

Note

Installation of an additional graylog-integrations-plugins package is required.

IPFIX Field Definitions

Any additional vendor/hardware-specific fields that are collected need to be defined in a JSON file. The file needs to provide the private operations number, as well as the additional field definitions that are being collected. Structure the JSON file according to the example below.

Example of JSON File

Provide the filepath of the JSON file with additional collected fields in the IPFIX field definitions option.

{
  "enterprise_number": PRIVATE ENTERPRISE NUMBER,
  "information_elements": [
    {
      "element_id": ELEMENT ID NUMBER,
      "name": "NAME OF DEFINITION",
      "data_type": "ABSTRACT DATA TYPE"
    },
    ...
    ...
    ...
  {
    "element_id": ELEMENT ID NUMBER,
    "name": "NAME OF DEFINITIONt",
    "data_type": "ABSTRACT DATA TYPE"
  }
  ]
}

IPFIX Data Types

IPFIX Data Types.png
IPFIX Input Modal.png

JSON Path from HTTP API Input

The HTTP API input with JSON Path reads JSON responses from REST resources and extracts field values, which it stores as Security Data Lake messages.

Note

This input can only extract JSON primitive values (such as numbers, text, or strings) and cannot target objects or arrays.

To configure the JSON Path from HTTP API Input, follow these steps:

1. Make sure the prerequisites are met // astea eu le-am scris, poti sa verifici te rog?

Before configuring the HTTP API input in Security Data Lake, ensure that:

  • The target REST API endpoint is accessible from the Security Data Lake server.

  • The API returns valid JSON responses.

  • You know the JSONPath of the value you want to extract.

  • If the API requires authentication, you have the credentials, tokens, or headers needed to access it.

2. Configure the Input in Security Data Lake

Configure the input by entering the following values:

Fields

Values

Node

Select the node on which to start this input.

Title

Provide a unique name for your input.

URI of JSON resource

Enter the URI for a resource that returns JSON on an HTTP request.

Interval

Set the time between collector runs. The time unit is set in the next field.

Example: If you set the Interval to 5 and the Interval time unit to minutes, then the collector runs every 5 minutes.

Interval time unit

Select a time unit for the interval between collector runs.

JSON path of data to extract

Enter the JSONPath expression that specifies the value to extract from the JSON response.

For more information, refer to Use Case.

Message source

Specify the value to use for the source field in the resulting message.

Enable Throttling

Allows Security Data Lake to pause data ingestion for this input when message processing falls behind, letting the system catch up.

HTTP method (optional)

Select the HTTP method for the request. The default is GET.

HTTP body (optional)

Enter the HTTP request body. This field is required if the HTTP method is set to POST or PUT.

HTTP content type (optional)

Select the HTTP content type for the request. This field is required if the HTTP method is POST or PUT.

Additional, sensitive HTTP headers (optional)

Enter a comma-separated list of HTTP headers that contain sensitive information, such as authorization credentials.

Example: Authorization: Bearer <token>

Additional HTTP headers (optional)

Enter a comma-separated list of additional HTTP headers.

Example: Accept: application/json, X-Requester: Data Insights

Override source (optional)

By default, the source field uses the hostname from the received packet. You can override this with a custom string to better identify or categorize the source.

Encoding (optional)

Messages must use the same encoding configured for the input. For example, UTF-8 messages should not be sent to an input set to UTF-16.

Flatten JSON

Select this option to flattened the entire JSON. The result is returned as message fields:

source = github ,jsonpath = $.download_count, interval time unit = Minutes

Use Case

The following example retrieves the download count for a specific release package from GitHub:

$ curl -XGET https://api.github.com/repos/YourAccount/YourRepo/releases/assets/12345
{
  "url": "https://api.github.com/repos/YourAccount/YourRepo/releases/assets/12345",
  "id": 12345,
  "name": "somerelease.tgz",
  "label": "somerelease.tgz",
  "content_type": "application/octet-stream",
  "state": "uploaded",
  "size": 38179285,
  "download_count": 9937,
  "created_at": "2013-09-30T20:05:01Z",
  "updated_at": "2013-09-30T20:05:46Z"
}

In this example, the target attribute is download_count, so the JSONPath expression is set to $.download_count.

The extracted value appears in Security Data Lake as a message similar to the following:

JSON path example.png

You can use Security Data Lake to analyze your download counts now.

Use JSONPath

JSONPath can do more than select a single field. For example, you can use it to retrieve the first download_count from a list of releases where the state field is set to uploaded:

$.releases[?(@.state == 'uploaded')][0].download_count

You can select only the first download_count in the list:

$.releases[0].download_count

For more information on JSONPath, review this KB article.

Microsoft Defender for Endpoint Input

Microsoft Defender for Endpoint is a cloud-based endpoint security solution that provides protection for enterprise devices with a range of security features, such as asset management, security baselines, vulnerability assessment, and advanced threat protection.

To configure the Microsoft Defender for Endpoint Input, follow these steps:

1. Make sure the prerequisites are met

To use the Microsoft Defender for Endpoint plugin, create and authorize a client application in your organization’s Microsoft Azure portal. Security Data Lake then polls Microsoft Defender for Endpoint at defined intervals and ingests new logs automatically.

2. Make the necessary configurations in Azure

  1. Log in to Microsoft Azure.

  2. Select Microsoft Entra ID from the menu on the left side of the page.

  3. From the menu on the left side of the page, select Manage > App Registrations.

  4. Select New Registration from the right side of the page.

  5. Register a new application by following these steps:

    1. Provide a name for the application, such as Security Data Lake Log Access.

    2. Select the appropriate account type: either Single Tenant or Multitenant, depending on whether your organization uses one or multiple Active Directory instances.

    3. Select Register.

      Warning

      Do not add a Redirect URI.

    Once the application is created, the following fields are automatically generated:

    • Application (client) ID

    • Directory (tenant) ID

  6. For the newly created application, go to Certificates & Secrets.

  7. Select New Client Secret.

  8. Add a description for the new secret, select an expiration time, and then select Add.

  9. Write down the Application (client) ID, Directory (tenant) ID, and Client Secret. You will need these values when configuring the input.

3. Create the necessary client application permissions in Azure

  1. For the newly created application, go to API Permissions.

  2. Select Add a permission.

  3. Select APIs my organization uses.

  4. Search for WindowsDefenderATP.

  5. Select WindowsDefenderATP.

  6. Select these permissions and click Add permissions:

    • Alert.Read.All

    • Alert.ReadWrite.All

    • User.Read.All

    • Vulnerability.Read.All

    • Machine.Read.All

  7. Select Grant admin consent for...

  8. Select Yes in the pop-up dialog to confirm.

4. Configure the Input in Security Data Lake

Configure the input by entering the following values:

Tip

You will need the Client ID, Tenant ID, and Client Secret Value from the step 2 to proceed.

Fields

Values

Input Name

Enter a unique name for the input.

Directory (tenant) ID

The ID of the Active Directory instance for which Security Data Lake will collect log data.

Application (client) ID

The ID of the Client Application created during step 2.

Client Secret Value

This is the client secret value generated during step 2.

Polling Interval

Specifies how often, in minutes, the input checks for new log data. The default is 5 minutes, which is recommended. The value must not be less than 1 minute.

Enable Throttling

Allows Security Data Lake to pause reading new data for this input when message processing falls behind, giving the system time to catch up.

Store Full Message

Allows Security Data Lake to store raw log data in the full_message field for each message. Enabling this option can significantly increase storage usage.

Microsoft Graph Input

The Microsoft Graph input supports collecting email logs, Microsoft Entra ID logs, directory, provisioning, and sign-in audit logs using Microsoft Graph APIs. See the official documentation for more information about the Microsoft Graph API.

1. Make sure the prerequisites are met

Make sure that the following prerequisites are met:

  • You must have an existing Entra ID account.

  • API user must be defined with the following permissions for the supported log types:

    Log Type

    Permissions

    License Requirements

    Email Logs

    User.ReadAll, User.ReadBasic.All, Mail.Read, Mail.ReadBasic, Mail.ReadBasic.All, Mail.ReadWrite

    Microsoft Office 365 Business

    Directory Audit logs

    AuditLog.Read.All, Directory.Read.All, Directory.ReadWrite.All

    Sign In Audit logs

    AuditLog.Read.All

    At least Microsoft Entra P1 or P2

    Provisioning Audit logs

    AuditLog.Read.All

2. Configure an Azure app

Follow the official Microsoft instructions to create a new Azure app and generate the credentials required for authentication. During setup, record the Client ID, Tenant ID, and Client Secret. These values are needed when configuring the input in Security Data Lake.

3. Configure the Input in Security Data Lake

Configure the input by entering the following values:

Fields

Values

Input Name

Enter a unique name for the input.

Tenant ID

Specify the Tenant ID of the Microsoft Entra ID account used to collect log data.

Client ID

Enter the Client ID of the application registered in the Microsoft Entra ID account.

Client Secret

Enter the Client Secret generated for the registered application in the Microsoft Entra ID account.

Subscription Type

Select the Azure AD subscription type for your organization.

Log Types to Collect

Select the log types to collect from Microsoft Graph. All log types are selected by default. At least one log type must be selected.

Polling Interval

Specifies how often, in minutes, the input checks for new data in Microsoft Graph. The minimum allowable interval is 5 minutes.

Read Time Offset (minutes)

Defines how long the input waits for new logs to become available in Microsoft Graph before attempting to read them.

Enable Throttling

Allows the system to temporarily pause reading new data from this input when message processing falls behind, enabling it to catch up.

Microsoft Office 365 Input

Microsoft Office 365 is a widely used cloud-based suite of productivity tools that allows you to pull your organization’s Office 365 logs into Security Data Lake for processing, monitoring, and alarming.

Note

While Microsoft has rebranded their Office 365 product to Microsoft 365, the following input as documented remains unaffected by this change.

1. Make sure the prerequisites are met

Make sure that the following prerequisites are met:

  • An authorized client application created and approved in your organization’s Microsoft Azure portal to enable the Office 365 plugin.

  • An active Office 365 subscription with access to audit logs and the Microsoft Azure portal.

    Note

    Accounts with E5 or A5 licenses generally include the required access, but you should make sure this is the case.

Security Data Lake polls the Office 365 audit log at defined intervals and ingests new logs automatically.

3. Configure an app in Azure

  1. Login to Microsoft Azure.

  2. Select Azure Active Directory from the menu on the left side of the screen.

  3. From the menu on the left side of the page, select Manage > App Registrations.

  4. Select New Registration from the right side of the page.

  5. Register a new application by following these steps:

    1. Provide a name for the application, such as Security Data Lake Log Access.

    2. Select the appropriate account type: either Single Tenant or Multitenant, depending on whether your organization uses one or multiple Active Directory instances.

    3. Select Register.

      Warning

      Do not add a Redirect URI.

    Once the application is created, the following fields are automatically generated:

    • Application (client) ID

    • Directory (tenant) ID

  6. For the newly created application, go to Certificates & Secrets.

  7. Select New Client Secret.

  8. Add a description for the new secret, select an expiration time, and then select Add.

  9. Write down the Application (client) ID, Directory (tenant) ID, and Client Secret. You will need these values when configuring the input.

4. Create the necessary client application permissions in Azure

  1. For the newly created application, go to API Permissions.

  2. Select Add a permission.

  3. Select Office 365 Management APIs.

  4. Select Application Permissions.

  5. Select all available permissions on the list, then select Add permissions.

  6. Select Grant admin consent for...

  7. Select Yes in the pop-up dialog to confirm.

Enable Unified Audit Logging

Go to the Audit Log Search page in Microsoft Purview and select Start recording user and admin activity to enable audit logging.

Enable Audit Logging for Office365.png

It may take up to 24 hours for logs to appear in Security Data Lake after Unified Audit Log is first enabled. We recommend waiting 24 hours before starting the Office 365 input setup in Security Data Lake to ensure the Azure subscription is properly configured for audit logging.

If the blue button labeled Start recording user and admin activity is not visible, audit logging is already enabled, and you can proceed with the remaining configuration steps.

4. Configure the Input in Security Data Lake

O365 Connection Configuration

Fields

Values

Input Name

Enter a unique name for the Office 365 input.

Directory (tenant) ID

Specify the ID of the Active Directory instance from which Graylog will collect log data.

Application (client) ID

Enter the ID of the client application created in the Microsoft Azure portal.

Client Secret Value

Enter the client secret value generated for the registered application.

Subscription Type

Select the type of Office 365 subscription your organization uses.

Enterprise and GCC Government Plans are the most common options.

O365 Content Subscription

Fields

Values

Log Types to Collect

Specifies which of the five available log types the input retrieves from Office 365. All options are selected by default: Azure Active Directory, SharePoint, Exchange, General, and DLP.

Polling Interval

Specifies how often, in minutes, the input checks for new log data.

The default interval is 5 minutes, which is recommended. The value must not be less than 1 minute.

Drop DLP Logs Containing Sensitive Data

Office 365 generates a summary log (without sensitive data) and a detailed log (with sensitive data) for each DLP event. Enabling this option drops detailed logs to prevent sensitive data from being stored in Security Data Lake.

Enable Throttling

Allows Security Data Lake to pause reading new data for this input when message processing falls behind, giving the system time to catch up.

Store Full Message

Allows Security Data Lake to store raw log data in the full_message field for each message.

Enabling this option can significantly increase storage usage.

Mimecast Input

The Mimecast input enables the collection of email security logs using Mimecast APIs, providing seamless integration with Security Data Lake for enhanced email threat analysis and monitoring. This input pulls logs from version 2.0 of the Mimecast API.

Note

This information applies the Mimecast input (v2.0 API). The Mimecast input (v1.0 API) has been deprecated

1. Make sure the prerequisites are met

Make sure that the following prerequisites are met:

  • An existing Mimecast account.

  • Please refer to the official Mimecast documentation for setting up an API application.

  • An API user with the Mimecast Administrator role and granted the following permissions:

    Log Type

    API Permissions

    Archive Message View Logs

    Archive, View Logs, Read

    Archive Search Logs

    Archive, Search Logs, Read

    Audit Events

    Account, Logs, Read

    DLP Logs

    Monitoring, Data Leak Prevention, Read

    Message Release Logs

    Monitoring, Held, Read

    Rejection Logs

    Monitoring, Rejections, Read

    Search Logs

    Archive, Search Logs, Read

    TTP Attachment Protection Logs

    Monitoring, Attachment Protection, Read

    TTP Impersonation Protect Logs

    Monitoring, Impersonation Protection, Read

    TTP URL Logs

    Monitoring, URL Protection, Read

2. Set up an API application in Mimecast

Refer to the Mimecast documentation for instructions on creating and configuring an API application.

3. Configure the Input in Security Data Lake

Configure the input by entering the following values:

Fields

Values

Input Name

Enter a unique, user-defined name for the input.

Client ID

Enter the Client ID associated with your Mimecast API application.

Client Secret

Enter the Client Secret generated for your Mimecast API application.

Log Types to Collect

Select the log types to collect. All log types are selected by default. At least one log type must be selected.

Polling Interval

Specifies how often, in minutes, Security Data Lake checks for new data from the Mimecast APIs. The minimum allowable interval is 5 minutes.

Enable Throttling

Allows Security Data Lake to pause reading new data from this input when message processing falls behind, giving the system time to catch up.

NetFlow Input

NetFlow, a network protocol developed by Cisco, provides IP traffic data that allows for monitoring and analysis. With Security Data Lake, you can collect IP flow data to include source, destination, service data, and other associated data points. Support for NetFlow export is device-dependent.

Configure NetFlow Input in Security Data Lake

After launching your new input, configure the following fields based on your preferences:Inputs

  • Global

    • Select this check box to enable the input on all Security Data Lake nodes, or keep it unchecked to enable the input on a specific node.

  • Node

    • Select the Security Data Lake node this input will be associated with.

  • Title

    • Assign a title for the input. Example : “NetFlow Input for XYZ Source”.

  • Bind Address

    • Enter an IP address that this input will listen on. The source system/data will send logs to this IP/Input.

  • Port:

    • Enter a port to use in conjunction with the IP. The default port of 2055 is the standard for most devices. However, if you need multiple inputs, you need to refer to vendor documentation on other port options (9555, 9995, 9025, and 9026 are common options).

  • Receive Buffer Size (optional)

    • This setting determines the size of the buffer that stores incoming data before it is processed. A larger buffer can accommodate more data, reducing the chance of data loss during high traffic periods. Depending on the amount of traffic being ingested by the input, this value should be large enough to ensure proper flow of data but small enough to prevent the system from spending resources trying to process the buffered data. The optimal size depends on your network traffic volume. Security Data Lake's default setting is somewhat conservative at 256 KB for testing and small deployments, so if you are dealing with high volumes of NetFlow data, increasing this value is advised. A practical recommendation is to start with a buffer size of at least 1 MB (1024 KB) and adjust based on observed performance.

  • No. of worker threads

    • This setting controls how many concurrent threads are used to process incoming data. Increasing the number of threads can enhance data processing speed, resulting in improved throughput. The ideal number of threads to configure depends on the available CPU cores on your Security Data Lake server. A common starting point is to align the number of worker threads with the number of CPU cores. However, it is crucial to strike a balance with other server demands.

  • Override source (optional)

    • By default, messages parse the source field as the provided hostname in the log message. However, if you want to override this setting for devices that output non-standard or unconfigurable hostnames, you can set an alternate source name here.

  • Encoding (optional)

    • All messages need to support the encoding configured for the input. For example, UTF-8 encoded messages should not be sent to an input configured to support UTF-16.

  • NetFlow 9 field definitions (optional)

    • NetFlow v9 field definitions specify how each data type is interpreted. It is crucial to define fields accurately to ensure that the collected NetFlow data is correctly parsed and understood. You should customize field definitions to match the specific types of data your network devices export.

      Below is a sample .yml file structure for defining NetFlow v9 fields. This example includes commonly exported fields by a Juniper Networks EX series switch. Please note that the actual fields and their IDs may vary depending on the switch configuration and NetFlow version. Adjust these definitions based on the specific NetFlow data exported by your Juniper Networks EX series switch.

netflow_definitions:
  # Basic flow fields
  - id: 1
    name: IN_BYTES
    type: UNSIGNED64
    description: Incoming counter with length N x 8 bits for the number of bytes associated with an IP Flow.
  - id: 2
    name: IN_PKTS
    type: UNSIGNED64
    description: Incoming counter with length N x 8 bits for the number of packets associated with an IP Flow.
  - id: 10
    name: INPUT_SNMP
    type: UNSIGNED32
    description: Input interface index. Use this value to query the SNMP IF-MIB.
  - id: 14
    name: OUT_BYTES
    type: UNSIGNED64
    description: Outgoing counter with length N x 8 bits for the number of bytes associated with an IP Flow.
  - id: 15
    name: OUT_PKTS
    type: UNSIGNED64
    description: Outgoing counter with length N x 8 bits for the number of packets associated with an IP Flow.
  - id: 16
    name: OUTPUT_SNMP
    type: UNSIGNED32
    description: Output interface index. Use this value to query the SNMP IF-MIB.
Device Sampling Rate

The following table includes recommended sampling rates for your devices based on average traffic volume.

Data Volume (95th percentile)

Recommended Sampling Rate

< 25 Mb/s

1 in 1

< 100 Mb/s

1 in 128

< 400 Mb/s

1 in 256

< 1 Gb/s

1 in 512

< 5 Gb/s

1 in 1024

< 25 Gb/s

1 in 2048

Okta Log Events Input

The Okta System Log records events related to your organization and provides an audit trail of platform activity. This input retrieves Okta Log Event objects and ingests them into Security Data Lake for further analysis of organizational activity.

To configure the Microsoft Defender for Endpoint Input, follow these steps:

1. Make sure the prerequisites are met

Make sure that the following prerequisites are met:

  • An active Okta organization account with administrative access.

  • API access enabled for your Okta tenant.

  • An API token generated from the Okta Admin Console. The token must have sufficient permissions to read System Log events.

  • The Okta System Log API endpoint URL, typically in the format:

    https://<your_okta_domain>/api/v1/logs
  • Network access from the Graylog server to the Okta API endpoint (port 443, HTTPS).

2. Configure the Input in Security Data Lake

Configure the input by entering the following values:

Fields

Values

Domain Name

Enter your Okta domain (also known as the Okta URL). Copy the domain from the Okta Developer Console.

For more information, see Find your domain.

API Key

Enter the API token used to authenticate Graylog’s requests to Okta. Create an API token from the Okta Developer Console.

For details, see Create an Okta API token.

Pull Log Events Since

Specifies the earliest time for the Okta log events to collect. This determines how much historical data Graylog pulls when the input starts.

If not provided, Graylog retrieves one polling interval of historical data. The timestamp must be in ISO-8601 format.

Polling Interval

Defines how often Graylog polls Okta for new log data. The value cannot be less than 5 seconds.

Keyword Filter (optional)

Filters log event results based on specified keywords. You can use up to 10 space-separated keywords, each with a maximum length of 40 characters.

Okta Log Events Input Modal.png

OpenTelemetry (gRPC) Input

The OpenTelemetry Google Remote Procedure Call (gRPC) input allows Security Data Lake to ingest log data from OpenTelemetry-instrumented applications and services using the OpenTelemetry Protocol (OTLP) over gRPC.

By using this input, you can receive structured OpenTelemetry logs, map relevant fields to Security Data Lake’s internal schema, and apply search, analysis, and alerting capabilities to their telemetry data.

Prerequisites

Before proceeding, ensure that the following prerequisites are met:

  • Install the OpenTelemetry Collector.

  • To send OpenTelemetry logs to Security Data Lake, the Security Data Lake server needs to be configured as a backend for logs. The following configuration snippets provide examples for configuring the OpenTelemetry Collector to send logs to Security Data Lake; however, they do not represent a complete configuration of the collector.

    • Insecure, unauthenticated log exporting.

      Warning

      This configuration is insecure and not recommended for production. Only use for testing!

      exporters:
        otlp/graylog:
          endpoint: "graylog.test:4317"
          tls:
            insecure: true
      
      service:
        pipelines:
          logs:
            exporters: [debug, otlp/graylog]
    • TLS, Bearer Token Authentication

      extensions:
        bearertokenauth/withscheme:
          scheme: "Bearer"
          token: "kst40ngmpq22oqej9ugughgh48i81n0vbm0tbuqnqk0oop5jl0h"
      
      exporters:
        otlp/graylog:
          endpoint: "graylog.test:4317"
          auth:
            authenticator: bearertokenauth/withscheme
          tls:
            ca_file: /tls/rootCA.pem
      
      service:
        extensions: [bearertokenauth/withscheme]
        pipelines:
          logs:
            exporters: [debug, otlp/graylog]
    • TLS, unauthenticated

      exporters:
        otlp/graylog:
          endpoint: "graylog.test:4317"
          tls:
            ca_file: /tls/rootCA.pem
      
      service:
        pipelines:
          logs:
            exporters: [debug, otlp/graylog]
    • Mutual TLS

      exporters:
        otlp/graylog:
          endpoint: "graylog.test:4317"
          tls:
            ca_file: /tls/rootCA.pem
            cert_file: /tls/client.pem
            key_file: /tls/client-key.pem
      
      service:
        pipelines:
          logs:
            exporters: [debug, otlp/graylog]

Transport Layer Security (TLS)

To ensure secure communication, the OpenTelemetry (gRPC) input provides multiple authentication and encryption mechanisms, which can be used separately or combined for additional security. The OpenTelemetry (gRPC) Input provides flexibility in security configurations. It is possible to:

  • Use TLS only, where the data is encrypted but no client authentication is enforced.

  • Enable mTLS, ensuring that only trusted clients with valid certificates can connect.

  • Use Bearer Token Authentication as an alternative to mTLS, requiring clients to authenticate via a token.

Security Data Lake Input Configuration

When launching this input from the Security Data Lake Inputs tab, configure the following field values:

  • Node: The node setting determines whether the input should run on a specific Security Data Lake node or be available globally across all nodes. Click the Global checkbox to run the input across all nodes.

  • Title: Assign a title to the input for easy identification.

  • Bind address: Enter an IP address for this input to listen on. The source system/data sends logs to this IP address/Input.

  • Port: By default, the input listens on 0.0.0.0, making it accessible on all network interfaces, and uses port 4317, which aligns with OpenTelemetry’s default for gRPC-based log ingestion.

  • Maximum size of gRPC inbound messages: The maximum message size of the message. The default value of 4194304 bytes (approximately 4 MB) should suffice but can be modified depending on message length.

  • Allow Throttling (checkbox): If enabled, the input temporarily stops reading new messages if Security Data Lake’s internal processing queue reaches its limit. Throttling prevents excessive memory usage and ensures system stability, particularly in high-throughput environments. To ensure that log data is not lost, implement appropriate retry behavior. The OpenTelemetry SDK and collector generally support retry mechanisms for transient failures, but you should verify that their configuration aligns with expected backoff and retry policies.

  • Required bearer token (optional): In addition to TLS or as an alternative authentication method, the input supports Bearer Token Authentication. When a required Bearer Token is defined, all clients must include this token in the authorization header of their requests. This method allows for access control without requiring client-side certificates.

  • Allow Insecure Connections (checkbox): Disable TLS encryption to allow insecure connections to the server.

    Note

    The TLS-related settings that follow ensure that valid sources can send messages to the input securely.

  • TLS Server Certificate Chain (optional): TLS encryption can be enabled to protect log data in transit. To activate TLS, you must provide a TLS Server Certificate Chain, which includes a PEM-encoded certificate used to authenticate the input.

  • TLS Server Private Key (optional): This method is required to support encrypted communication. If the certificate is signed by a trusted Certificate Authority (CA), clients can establish secure connections without further configuration.

  • TLS Client Certificate Chain (optional): For stronger authentication, mutual TLS (mTLS) can be enabled by specifying a TLS Client Certificate Chain. This method ensures that only clients with valid certificates issued by a trusted authority can send logs to Security Data Lake. If mTLS is configured, the input rejects connections from unauthorized clients.

  • Override source (optional): By default, the source is a hostname derived from the received packet. You can override the default value with a custom string. This option allows you to optimize the source for your specific needs.

Mapping OpenTelemetry Log Fields to Security Data Lake Fields

Once logs are received, Security Data Lake maps key OpenTelemetry log fields to its internal schema for efficient indexing and querying. Because Security Data Lake does not support nested fields in messages, the structure of the incoming log signals, as specified in the OpenTelemetry Protobuf Specification, cannot be mapped exactly to Security Data Lake log messages. When mapping OpenTelemetry logs to Security Data Lake messages, a number of rules are applied.

Note

As a general rule, Security Data Lake automatically replaces dots (.) in incoming message field names with underscores (_).

The following sections highlight how OpenTelemetry log messages are translated into Security Data Lake messages.

Core Security Data Lake Message Fields Mapping

These fields represent how core Security Data Lake message fields are mapped from incoming OpenTelemetry log records.

  • source: The address of the remote party that initiated the connection to the input.

  • timestamp: Uses time_unix_nano from OpenTelemetry logs if available, otherwise observed_time_unix_nano, or the received timestamp of the record at the input as a fallback.

  • message: Content of the OpenTelemetry log record body field.

First-Level Field Mapping

OpenTelemetry Field

Security Data Lake Field

trace_id

otel_trace_id

span_id

otel_span_id

flags

otel_trace_flags

severity_text

otel_severity_text

severity_number

otel_severity_number

time_unix_nano

otel_time_unix_nano

observed_time_unix_nano

otel_observed_time_unix_nano

Resource and Attributes Mapping
  • Resource Attributes: Prefixed with otel_resource_attributes_ and converted to Security Data Lake fields.

  • Resource Schema URL: Mapped to otel_resource_schema_url.

  • Log Attributes: Prefixed with otel_attributes_.

  • Log Schema URL: Mapped to otel_schema_url.

  • Instrumentation Scope:

    • otel_scope_name

    • otel_scope_version

    • otel_scope_attributes_*

Value Handling
  • Primitive Types (string, boolean, integer, double): Directly converted.

  • Bytes: Base64 encoded.

  • Lists:

    • Single-type primitives: Converted to a list.

    • Mixed primitives: Converted to a list of strings.

    • Nested arrays/maps: Serialized as JSON.

  • Maps: Flattened into individual fields, using _ as a separator.

Considerations and Limitations

While this input enables Security Data Lake’s support for OpenTelemetry, there are a few important considerations to keep in mind. First, only log data is supported. Metrics and traces transmitted over OTLP/gRPC are not ingested by this input. Additionally, OTLP over HTTP is not supported; the input exclusively accepts data over the gRPC transport. If you are running Security Data Lake behind a load balancer, it is essential to ensure that the required ports are open and that TLS/mTLS configurations are properly forwarded to maintain secure and uninterrupted communication.

Palo Alto Networks Input

Palo Alto Networks input allows Security Data Lake to receive SYSTEM, THREAT, and TRAFFIC logs directly from a Palo Alto device and the Palo Alto Panorama system. A standard Syslog output is used on the device side. Logs are sent with a typical syslog header followed by a comma-separated list of fields. The fields order may change between versions of PAN OS.

Example SYSTEM message:

<14>1 2018-09-19T11:50:35-05:00 Panorama-1 - - - - 1,2018/09/19 11:50:35,000710000506,SYSTEM,general,0,2018/09/19 11:50:35,,general,,0,0,general,informational,"Deviating device: Prod--2, Serial: 007255000045717, Object: N/A, Metric: mp-cpu, Value: 34",1163103,0x0,0,0,0,0,,Panorama-1

To get started, add a new Palo Alto Networks Input (TCP) in System > Inputs. Specify the Security Data Lake node, bind address, port, and adjust the field mappings as needed.

Warning

Palo Alto devices should be configured to send data without custom formats.

Security Data Lake has three different inputs:

  • Palo Alto Networks TCP (PAN-OS v8.x)

  • Palo Alto Networks TCP (PAN-OS v9+)

  • Palo Alto Networks TCP (PAN-OS v11+)

Warning

PAN-OS 8.1*, 9.0, and 10.0 are EoL according to the Palo Alto Networks website. Critical fixes may be provided for 8.1. See the Palo Alto documentation for more information.

PAN-OS 8 Input

Note

Before you configure the time zone on the Inputsform, note that the value is set to UTC+00:00 - UTC by default. However, you can set it to a specific offset from a dropdown menu found in the input configuration form. Since PAN device logs do not include timezone offset information, this field allows Security Data Lake to correctly parse the timestamps from logs. If your PAN device is set to UTC, you do not need to change this value.

This input ships with a field configuration that is compatible with PAN OS 8.1. Other versions are supported by customizing the SYSTEM, THREAT, and TRAFFIC mappings on the Add/Edit input page in Security Data Lake.

The configuration for each message type is a CSV block that must include the position, field, and type headers.

For example:

1,receive_time,STRING
2,serial_number,STRING
3,type,STRING
4,content_threat_type,STRING
5,future_use1,STRING
...

Field

Accepted Values

position

A positive integer value.

field

A contiguous string value to use for the field name. Must not include the reserved field names: _id, message, full_message, source, timestamp, level, streams.

type

One of the following supported types: BOOLEAN, LONG, STRING.

When the Palo Alto input starts, the validity of each CSV configuration is checked. If the CSV is malformed or contains invalid properties, the input will fail to start. An error will be displayed at the top of the System > Overview page.

For example:

input fail.png

The default mappings built into the plugin are based on the following PAN-OS 8.1 specifications. If running PAN-OS 8.1, then there is no need to edit the mappings. However, if running a different version of PAN-OS, please refer to the official Palo Alto Networks log fields documentation for that version, and customize the mappings on the Add/Edit Input page accordingly.

PAN-OS 9 Input

PAN-OS 9 input auto-detects if the ingested data is from Version 9.0 or 9.1. Version 9.1 is supported automatically and will work out of the box.

We have included links to a few recent versions here for reference.

Version 9.1

Also see Documentation for older PAN OS versions

PAN-OS 11 Input

The PAN-OS 11 input automatically detects whether the ingested data is from version 11.0 or later and processes the log data using either processing pipelines or Illuminate content. This input does not fully parse the entire message schema, instead, it extracts key fields such as event_source_product and vendor_subtype, which are added to the message.

We have included links to a few recent versions here for reference.

Version 11.0

Random HTTP message generator

The Random HTTP Message Generator input is a Security Data Lake utility designed to produce artificial HTTP message traffic for testing, benchmarking, or demonstration purposes. Instead of relying on an external data source, it autonomously generates HTTP-like messages at configurable intervals and sends them into Security Data Lake’s processing pipeline. This is especially useful when testing stream rules, extractors, pipelines, and dashboards without requiring a live log source.

This input runs locally on a selected Security Data Lake node (or across all nodes if configured as global) and mimics a realistic, non-steady stream of messages by introducing random variations in message timing and source information.

Requirements and Preparation

No external system is required to send messages to this input, since the messages are generated internally by Security Data Lake. However, the following considerations should be met before enabling it:

  • Node availability - Ensure that the node selected to host the input (server, forwarder, or sidecar) is active and properly connected to the cluster.

  • System resources - Continuous message generation can create a high load depending on the configured sleep interval and deviation. Verify that sufficient CPU and memory are available.

  • Intended environment - This input is primarily meant for testing or staging environments, not production, since it generates synthetic data and may interfere with normal message processing statistics.

  • Destination configuration - Make sure Security Data Lake pipelines, streams, or extractors are set up to handle or discard these test messages appropriately.

Configuration in Security Data Lake

When creating the Random HTTP Message Generator input, configure the following fields:

Field

Description

Node

Specifies the node on which this input will start. Useful for directing message generation to a specific forwarder or SDL cluster component.

Title

A custom name for the input, allowing you to easily identify it among other inputs. Example: “Test HTTP Generator – SDL Cluster”.

Sleep time

Defines the base delay (in milliseconds) between two generated messages. Lower values result in higher message throughput.

Maximum random sleep time deviation

Adds a random delay (up to this value in milliseconds) to the base sleep time to simulate an irregular message flow. For example, with a base sleep of 25 ms and a deviation of 30 ms, each message will be delayed by between 25 ms and 55 ms.

Source name

Specifies the hostname or identifier used in the generated message as its “source” field. This can be any arbitrary name such as example.org or local-test.

Allow throttling this input

When enabled, the input automatically pauses message generation if the processing pipeline falls behind, ensuring Security Data Lake doesn’t become overloaded. Typically left disabled for synthetic testing where full throughput is desired.

Override source (optional)

Allows you to manually override the default hostname derived from the received packet. This is useful if you want all generated messages to appear as if they came from a specific host.

Encoding (optional)

Sets the character encoding for generated messages. Default is UTF-8, which should be used unless specific encoding tests are needed.

Usage Notes

  • Performance testing - Adjust Sleep time and Maximum random deviation to simulate different message ingestion rates and burst patterns.

  • Stream validation - You can use the generated messages to verify Security Data Lake stream rules, extractors, or pipeline processing logic.

  • Isolation - For clarity during debugging, set a unique Source name to easily filter these messages in the search interface.

  • Cleanup - Since this input produces non-essential data, remember to stop or delete it after testing to avoid cluttering your message index.

Raw HTTP Input

The Raw HTTP input allows the ingestion of plain-text HTTP requests. This input can be used to receive arbitrary log format messages in Security Data Lake over HTTP protocol.

Note

This input listens for HTTP posts on the /raw path.

Security Data Lake Configuration

When launching a new Raw HTTP input from the Security Data Lake Inputs tab, the following configuration parameters need to be completed:

  • Global

    • Select this check box to enable this input on all Security Data Lake nodes, or keep it unchecked to enable the input on a specific node.

  • Node

    • Select the node on which to start this input. If the Global check box is selected, this option is not available.

  • Title

    • Provide a unique name for your input.

  • Bind Address

    • Enter an IP address for this input to listen on. The source system/data sends logs to this input via this IP address.

  • Port

    • Enter a port to use in conjunction with the IP address.

  • Receive Buffer Size (optional)

    • This setting determines the size of the buffer that stores incoming data before it is processed. A larger buffer can accommodate more data, reducing the chance of data loss during high traffic periods. Depending on the amount of traffic being ingested by the input, this value should be large enough to ensure proper flow of data but small enough to prevent the system from spending resources trying to process the buffered data. The optimal size depends on your network traffic volume. Security Data Lake's default setting is somewhat conservative at 256 KB for testing and small deployments, so if you are dealing with high volumes of NetFlow data, increasing this value is advised. A practical recommendation is to start with a buffer size of at least 1 MB (1024 KB) and adjust based on observed performance.

  • No. of Worker Threads (optional)

    • This setting controls how many concurrent threads are used to process incoming data. Increasing the number of threads can enhance data processing speed, resulting in improved throughput. The ideal number of threads to configure depends on the available CPU cores on your Security Data Lake server. A common starting point is to align the number of worker threads with the number of CPU cores. However, it is crucial to strike a balance with other server demands.

    Note

    The TLS-related settings that follow ensure that valid sources can send messages to the input securely.

  • TLS Cert File (optional)

    • The certificate file that is stored on a Security Data Lake system. The value of this field is a path (/path/to/file) that Security Data Lake must have access to.

  • TLS Private Key File (optional)

    • The certificate private key file that is stored on a Security Data Lake system. The value of this field is a path (/path/to/file) that Security Data Lake must have access to.

  • Enable TLS

    • Select if this input should use TLS.

  • TLS Key Password (optional)

    • The private key password.

  • TLS Client Authentication (optional)

    • If you want to require the source of the messages sending logs to this input to authenticate themselves, set to optional or required.

  • TLS Client Auth Trusted Certs (optional)

    • The path where the client (source) certificates are located on a Security Data Lake system. The value of this field is a path (/path/to/file) which Security Data Lake must have access to.

  • TCP Keepalive

    • Enable this option if you want the input to support TCP keep-alive packets to prevent idle connections.

  • Enable Bulk Receiving

    • Enable this option to receive bulk messages separated by newlines (\n or \r\n).

  • Enable CORS

    • Enable Cross-Origin Resource Sharing (CORS) to configure your server to send specific headers in the HTTP response that instruct the browser to allow cross-origin requests.

  • Max. HTTP chunk size (optional)

    • For large data, it is common practice to chunk smaller blocks (e.g. 8 KB or 64 KB chunks) to prevent overwhelming buffers. The maximum HTTP chunk size is 65536 bytes.

  • Idle writer timeout (optional)

    • The maximum amount of time the server waits for a client to send data when writing to an output stream before closing the connection due to inactivity.

  • Authorization Header Name (optional)

    • Specify a custom authorization header name to optionally enforce authentication for all received messages. This setting is a way to add password-like security for this input.

  • Authorization Header Value (optional)

    • Specify the authorization header value to optionally enforce authentication for all received messages.

  • Override source (optional)

    • By default, messages parse the source field as the provided hostname in the log message. However, if you want to override this setting for devices that output non-standard or unconfigurable hostnames, you can set an alternate source name here.

  • Encoding (optional)

    • All messages need to support the encoding configured for the input. Default encoding is UTF-8. For example, UTF-8 encoded messages should not be sent to an input configured to support UTF-16.

After launching a Raw HTTP input you can use the following endpoints to send messages:

http://graylog.example.org:[port]/raw (POST)

Try sending an example message using curl:

curl -XPOST http://graylog.example.org:12202/raw -d 'Sample message'

Cloudflare Logpush with Raw HTTP Input

Logs from the Cloudflare Logpush service (via HTTP destination) can be ingested into Security Data Lake using the Raw HTTP input. When set up and configured, Logpush will post newline-delimited batches of log messages to the input over HTTP protocol.

General information about this input, including configuration options, may be found in the Raw HTTP Input documentation.

Note

Note that you may review an additional use case for the Raw HTTP input in GitLab Audit Event Streaming with Raw HTTP Input.

Prerequisites

Before proceeding, ensure that the following prerequisites are met:

  • A Cloudflare subscription is required.

  • The Cloudflare Logpush HTTP destination service must be able to forward to an endpoint in your environment that is secured with TLS. See Secure Inputs with TLS for more information. (Note that you may also choose to route through a firewall or gateway to fulfill TLS requirement).Secure Inputs with TLS

  • We strongly recommend using the Authorization Header option when setting up the Raw HTTP input to ensure message requests are authenticated.

Set up the Input

Navigate to System > Inputs and select Raw HTTP to launch the new input. The following configuration settings must be carefully considered when setting up this input for Cloudflare Logpush:

  • Bind Address and Port: Ensure that Cloudflare can route through your network to the IP address and port specified. Note that the Raw HTTP input listens for HTTP requests at the /raw root HTTP path.

  • TLS Settings: TLS must either be enabled for this endpoint, or you can choose to route through a firewall or gateway to fulfill the required usage of TLS.

  • Enable Bulk Receiving: Be sure to select this option. This will ensure that the input will correctly split newline-delimited batches of log messages sent from Cloudflare.

  • Authorization Header: Specify a name and value for the authorization header to use. This will ensure that the input will only accept communication where appropriate authentication is validated.

    • Authorization Header Name: authorization

    • Authorization Header Value: Choose a secure password with sufficient length and complexity to meet your requirements. Use the same value for the authorization setting in Cloudflare.

For the additional configuration settings available, see the Raw HTTP Input documentation for more details. Unless required for your environment, we recommend you use the default settings when determining these additional configuration properties.

Enable the HTTP Destination in Cloudflare

After setting up the new input, you must enable the Logpush service to send logs to Security Data Lake. This is done by defining the Security Data Lake endpoint as a Logpush destination. For information on this process, see the Cloudflare documentation.

Note

Note that the first few steps described in the Cloudflare documentationdirect you to select the appropriate website (i.e. domain) you want to use with Logpush. This can be done by selecting Websites from the Cloudflare management console navigation bar and clicking Add a domain. This step is essential to getting your Cloudflare logs into Security Data Lake!.

When you are prompted to enter the URI where the Raw HTTP input is listening for requests, ensure the URL includes the /raw root path. For example:

https://graylog-host:port/raw?header_Authorization=<Graylog input Authorization Header Value value>
Logpush_destination.png

GitLab Audit Event Streaming with Raw HTTP Input

Logs from GitLab Audit Event Streaming (via HTTP destinations) can be ingested into Security Data Lake using the Raw HTTP input. When configured successfully, GitLab will post newline-delimited batches of log messages to the input over HTTP.

General information about this input, including configuration options, may be found in the Raw HTTP Input documentation.

Note

Note that you may review an additional use case for the Raw HTTP input in Cloudflare Logpush with Raw HTTP Input

Prerequisites

Before proceeding, ensure that the following prerequisites are met:

  • An existing GitLab account is required.

  • A Security Data Lake Raw HTTP input must be configured to listen on a port that can accept traffic from GitLab’s service running on the public internet.

Set Up GitLab Audit Event Streaming

To stream GitLab audit logs into Security Data Lake, several key configuration steps are required. This section outlines the necessary setup for integrating GitLab's Audit Event Streaming with a Security Data Lake instance. It begins with configuring GitLab to forward audit events, followed by specifying the destination details within Security Data Lake, such as the destination name, server URL, and custom HTTP headers. Additionally, optional event filtering can be implemented to tailor which audit logs are captured. See the official GitLab documentation for more information.

Configure the Destination
  • DestinationName: Assign an appropriate destination name.

  • DestinationURL: Specify the public-facing host name and port for the Security Data Lake server where the Raw HTTP input running, e.g. https://<datainsights-server-hostname>/raw.

  • CustomHTTPHeaders: Add a custom header with the same values specified in the input configuration above. Add any additional headers as required by your particular network setup.

  • (Optional) EventFiltering: Determine filters logs to be streamed to the Security Data Lake input.

Destinations.png
Set up the Input

Navigate to System > Inputs and select Raw HTTP to launch the new input. The following configuration settings must be carefully considered when setting up this input for GitLab Audit Event Streaming:

  • Bind Address and Port: Ensure that GitLab can route through your network to the IP address and port specified. Note that the Raw HTTP input listens for HTTP requests at the /raw root HTTP path.

  • Authorization Header: Specify a name and value for the authorization header to use. This will ensure that the input will only accept communication where appropriate authentication is validated. Enter the same values used when configuring the GitLab Audit Streaming service in the previous section.

  • TLS Settings: TLS must either be enabled for this endpoint, or you can choose to route through a firewall or gateway to fulfill the required usage of TLS.

  • Enable Bulk Receiving: Be sure to select this option. This will ensure that the input will correctly split newline-delimited batches of log messages sent from GitLab.

For the additional configuration settings available, see the Raw HTTP Input documentation for more details. Unless required for your environment, we recommend you use the default settings when determining these additional configuration properties.

Salesforce Input

Salesforce provides cloud-based business management, customer relationship management, and sales tools. The platform generates multiple types of logs that Security Data Lake can collect through the EventLogFile API. For more information, see the official Salesforce documentation for all supported log event types.

Security Data Lake currently supports all EventLogFile source types as of version 58 of the Salesforce EventLogFile API.

1. Make sure the prerequisites are met

Make sure that the following prerequisites are met:

2. Set Up Salesforce for EventLogFile API Access

Before configuring the input in Security Data Lake, complete the following steps in Salesforce to enable access to the EventLogFile API:

  1. Create a Connected App in the Salesforce App Manager. For detailed instructions, refer to the Salesforce documentation.

  2. Grant read permissions for the EventLogFile API to the Security Data Lake application during the Connected App setup.

  3. Configure the OAuth for the Connected App. This step generates the Client ID and Client Secret required for Security Data Lake to connect to the Salesforce API. For more information, review the Salesforce documentation.

3. Configure the Input in Security Data Lake

Configure the input by entering the following values:

Fields

Values

Input Name

Enter a unique name for the Salesforce input.

Base Salesforce URL

Enter the full base URL for your Salesforce instance, for example: https://instance.my.salesforce.com.

Consumer Key

Enter the Consumer Key from the Salesforce Connected App that was created with the required API permissions.

Consumer Secret

Enter the Consumer Secret from the Salesforce Connected App.

Log Types to Collect

Select the activity log types to collect. The input will fetch logs for the selected content types.

Polling Interval

Specifies how often, in minutes, Graylog checks for new data in Salesforce. The minimum allowable interval is 5 minutes.

Enable Throttling

Allows Graylog to pause reading new data for this input when message processing falls behind, giving the system time to catch up.

This setting is useful for inputs reading from files or message queue systems such as AMQP or Kafka. For inputs that regularly poll external systems (for example, via HTTP), it is recommended to leave this option disabled.

Sophos Central Input

The Sophos Central input collects events and alerts from the Sophos Central SIEM Integration API for analysis in Graylog.

1. Make sure the prerequisites are met

Make sure that the following prerequisites are met:

2. Configure Sophos for API Access

Complete the following steps in Sophos Central to enable access to the SIEM Integration API:

  1. Generate API authentication credentials by following the official Sophos API Credentials Management documentation.

  2. When creating the credentials, select Service Principal Read-Only to grant the required access for SIEM Integration logs.

  3. After the credentials are generated, copy the Client ID and Secret ID. You will need these values when configuring the input in Security Data Lake.

3. Configure the Input in Security Data Lake

Configure the input by entering the following values:

Fields

Values

Input Name

Enter a unique name for the Sophos Central input.

Client ID

Enter the Client ID provided during the Sophos API Credential setup.

Client Secret

Enter the Client Secret provided during the Sophos API Credential setup.

Ingest Alerts

This input automatically ingests Sophos events. Select this option to also ingest Sophos alerts.

For more details, refer to the Sophos documentation.

Polling Interval

Specifies how often, in minutes, the input checks for new logs. The minimum allowable interval is 5 minutes.

Enable Throttling

Allows Graylog to pause reading new data for this input when message processing falls behind, giving the system time to catch up.

Important

The Sophos SIEM Integration API only retains log data for 24 hours. We recommend that you avoid keeping this input stopped for longer periods, to avoid gaps in the logs due to this limitation.

Symantec EDR Events Input

Symantec Endpoint Detection and Response (EDR) is used to detect suspicious activities in your environment and take appropriate action. EDR collects various incidents and event types.

Prerequisites

Complete Setup in EDR

  • For Security Data Lake to connect to the Symantec EDR API, an OAuth client must be created with sufficient permission that produces Client ID and Client Secret to connect to the API. Instructions for creating an OAuth client are available in the Symantec documentation, "Generating an OAuth Client."

  • A custom role must be specified with the following permissions: atp_view_events, atp_view_incidents, atp_view_audit, and atp_view_datafeeds.

Configure Input in Security Data Lake

To launch a new SymantecEDR Events input:

  1. Navigate to the System > Inputs.

  2. Select Symantec EDR Events from the input options and click the Launch new inputbutton.

  3. Follow the setup wizard to configure the input.

Configuration Parameters

  • Input Name

    • Provide a unique name for your new input.

  • Management Server Host

    • The IP address or host name of your Symantec EDR Management server.

  • Client ID

    • The Client ID of the Symantec EDR Connected App created with sufficient API permissions.

  • Client Secret

    • The Client Secret of the Symantec EDR Connected App.

  • Log Types to Collect

    • The type of activity logs to fetch.

  • Polling Interval

    • How often (in minutes) Security Data Lake checks for new data in Symantec EDR. The smallest allowable interval is 5 minutes.

  • Enable Throttling

    • If enabled, no new message is read from this input until Security Data Lake catches up with its message load. This configuration parameter is typically useful for inputs reading from files or message queue systems like AMQP or Kafka. If you regularly poll an external system, e.g. via HTTP, you should leave this option disabled.

Supported Log Types

Security Data Lake offers support for a variety of event type IDs and incidents. For a detailed list of Symantec event detection types and descriptions, review the documentation on event detection types and descriptions.

Symantec SES Events Input

Symantec Endpoint Security (SES) is the fully cloud-managed version of Symantec Endpoint Protection (SEP). It provides multi-layered protection to prevent threats across all attack vectors. SES generates multiple types of incidents and event logs that can be collected and analyzed in Security Data Lake.

1. Make sure the prerequisites are met

Make sure that the following prerequisites are met:

2. Configure Symantec SES for API Access

Complete the following steps in Symantec Endpoint Security (SES) to allow access to the Event Stream API:

  1. Create an event stream and a client application in Symantec SES.

  2. When configuring the client application, assign read permissions for events and alerts to enable Security Data Lake to collect data from the Event Stream API.

3. Create a Client Application

Follow these steps to create a client application in Symantec Endpoint Security (SES):

  1. Add a new client application in the SES console.

  2. Record the Client ID and OAuth token generated for the application. These credentials will be required when configuring the input in Security Data Lake.

  3. For the client application, specify the View permissions for Alerts & Events and Investigation.

  4. Assign permissions to the client application: set View permissions for Alerts & Events and Investigation.

    Create Client Application.png

4. Create an Event Stream

Follow these steps to create an event stream in Symantec Endpoint Security (SES):

  1. Open the Symantec SES console.

  2. Create a new event stream.

  3. Select all event types you want Security Data Lake to receive when configuring the event stream.

  4. Record the Stream GUID and Channel values. These are required when configuring the input in Security Data Lake.

5. Configure the Input in Security Data Lake

Configure the input by entering the following values:

Fields

Values

Input Name

Enter a unique name for the Symantec SES input.

OAuth Credentials

Enter the OAuth token for the Symantec SES client application created with sufficient API permissions.

Hosting Location

Select the region where your Symantec SES instance is hosted.

Log Types to Collect

Select the activity log types that Graylog should fetch from Symantec SES.

Stream GUID

Enter the GUID of the event stream created with the required event types for data streaming.

Number of Channels

Specify the number of channels configured for the event stream.

Polling Interval

Defines how often, in minutes, Graylog checks for new data from Symantec SES. The minimum allowable interval is 5 minutes.

Enable Throttling

Allows Security Data Lake to pause reading new data for this input when message processing falls behind, giving the system time to catch up.

This option is primarily useful for inputs reading from files or message queue systems such as AMQP or Kafka. For inputs that poll external systems (e.g., via HTTP), it is recommended to leave this option disabled.

Checkpoint Interval

Specifies how often, in seconds, Security Data Lake records checkpoints for Symantec SES data streams.

Stream Connection Timeout

Defines the event stream connection timeout in minutes. This value determines how long the stream connection remains active.

Supported Log Types

Security Data Lake offers support for a variety of event type IDs and incidents. For a detailed list of Symantec event detection types and descriptions, review the documentation on event detection types and descriptions.