Data Lake
A Data Lake is a centralized repository for storing large volumes of log data that do not need to be searched or analyzed immediately but must still be retained for future use. Data Lakes are ideal for long-term storage, compliance, or archiving purposes, allowing you to keep logs accessible without affecting active search performance in Security Data Lake.
Data Lake routing is configured per stream, meaning all messages filtered into that stream can be written directly to the Data Lake after processing. Stored log data can later be previewed, retrieved, and reanalyzed for use cases such as event investigation, alert monitoring, dashboard visualization, and reporting.
Benefits
A Data Lake offers low-cost, long-term log storage. Data can be previewed and retrieved when needed for analysis.
Data Lake Preview provides a quick, high-level view to help decide whether to start a retrieval.
You can retrieve Data Lake logs for search, analysis, visualization, reporting, and more.
Logs routed to the Data Lake (and not to the search backend) do not count against the licensed daily ingestion volume until they are retrieved by a user.
Data Lake vs. Archive
You can use both a Data Lake and archives for long-term preservation. While they serve similar purposes, a Data Lake is often preferable for data with lower immediate value because:
Retrieval is typically faster due to granular restores.
Data is compressed, which can reduce storage costs.
Route Your Logs to Your Data Lake
To route logs to the Data Lake, enable Data Routing on the target stream and select Data Lake as a log destination. You can add filter rules on the stream to control which logs go to the Data Lake and which go to other destinations (for example, an index set).
Preview Data in Your Data Lake
The Data Lake Preview provides a high-level look at stored data. Use filters on the Preview page to target specific data and inspect it before deciding to retrieve it into Graylog for search and analysis.
Retrieve logs from your Data Lake
To search and analyze logs stored in a Data Lake, you must first retrieve the data so it can be written back to your search backend. Retrieval operations are based on the streams originally used to route data into the Data Lake. When retrieved, logs are restored to the index set defined when the stream was created.
You can perform selective retrieval by applying filters and defining a specific time range to narrow down which data is pulled. To confirm that the correct data exists before retrieval, use Data Lake Preview to check for matching logs.
Manage Your Data Lake
After you set up a Data Lake, you can monitor and manage it from the Overview tab at Data Lake > Setup. The Data Lake Jobs section lists current and recent jobs running against the Data Lake with their status. Use this information to troubleshoot any issues that occur as well as to plan data retrieval operations.
This page also lists all the streams that are routing logs into the Data Lake with details about each. From the streams list, you can start a preview logs or retrieve logs action for a stream. Click Data Routing to review or update the data routing definition for the stream.
Preview logs from a Data Lake
Data Lake Preview provides visibility into log data stored in your Data Lake. You can preview and examine logs before retrieving them for search and analysis. Previewing data does not affect license usage. Log data counts toward license usage only after retrieval, if it has not already been sent to your search backend.
Data Lake Preview vs. Search
Data Lake Preview is not equivalent to a Security Data Lake search.
Search operates on indexed data stored in your search backend and supports custom query syntax.
Preview provides a simplified interface for viewing stored log data but has limited filtering options and field availability.
To preview data, select a stream routed to your Data Lake, then narrow your view using filters and time range controls based on the timestamps of the stored messages.
Note
Data Lake filters differ from those used in standard Security Data Lake searches:
Preview filters are based on metadata from the storage infrastructure, not the original log fields.
You cannot create or apply filters for custom field names.
Preview Log Data
To preview data in a Data Lake:
Go to Data Lake > Preview.
Select the stream from which you want to preview log data.
(Optional) Select the time range from which you want to preview log data. By default, the preview displays logs from the past 30 minutes.
Optional) Apply filters to narrow results:
Select Filter by fields, choose a field from the dropdown, and enter a value.
Click Add field to add more filters.
Combine multiple filters using
ANDorORlogic.While the preview can use many filters, a retrieval operation supports a maximum of three filters.
Note
Filtering on the stream or associated_assets fields can significantly increase processing time due to intensive in-memory computation and data transfer.
For optimal performance, use these filters with a narrow time range and limit the number of additional filters.
Click Perform Search.
If your Data Lake contains large volumes of data, the preview may take time to complete. Only one preview job can run per user at a time, and preview job execution is limited to four hours. Jobs exceeding this limit will fail.
Matching logs appear in a list view widget on the Preview page. The most recent preview results are available for 24 hours. You can refine results by adjusting filters or time ranges, then selecting Perform Search again.
To customize the list view, click the Edit icon at the top right of the table to:
Add or remove columns.
Reorder columns to prioritize relevant fields.
Add known custom fields to the table if they exist in the stored data.
Select the expand icon beside any row to view a pop-up containing all fields and values for that message. You can also use the Highlighting option at the top of the page to color-code fields or values for easier identification.
Note
The preview widget can display up to 500 results. For large datasets, try different filter combinations to isolate the data you need.
Retrieve Log Data from Preview
After reviewing your log data, you can retrieve it directly from the preview. There are two retrieval methods:
Retrieve from the full preview
Select Retrieve logs in the upper-right corner of the preview list.
The Retrieve Logs window is open, pre-filled with your current filters and selections.
Adjust settings as needed, then click Retrieve.
Note
A retrieval job can include no more than three filters.
Retrieve selected log messages
In the preview results, select the checkboxes for the log messages you want to retrieve.
Select Bulk actions > Retrieve Logs.
Confirm the retrieval by selecting Retrieve in the window showing the number of selected logs.
You can track retrieval progress from the Overview tab under Data Lake > Setup.
After the retrieval completes, you can view the retrieved logs by going to Streams → [Your Stream] > Data Routing > Destinations > Data Lake > Retrieval Operations.
Retrieve Logs from a Data Lake
When you want to search and analyze logs stored in a Data Lake, you must first retrieve them so that they can be written to your search backend. You can retrieve logs from specific streams based on time ranges, and you can apply filters to narrow the scope of the retrieved data.
Logs are restored to the index set defined when the stream was originally created.
Note
Logs routed to a Data Lake and not sent to your search backend do not count against license usage until they are retrieved.
Retrieved log data counts against license usage once the retrieval is complete.
Retrieved Data Index
When logs are retrieved from the Data Lake, the restored data is sent to your search backend and indexed for search and other operations.
Each retrieval creates a new index with the prefix restored-archive-data-lake, followed by a unique numeric identifier. Once indexed, the restored data behaves the same as data originally routed to your search backend.
Note
Retrieving data does not remove it from the Data Lake. A copy of the restored data remains stored in the Data Lake.
Retrieve Logs
To retrieve log data from a Data Lake, follow these steps:
Navigate to Data Lake > Setup > Overview.
Locate the stream you want to retrieve logs from.
Select Retrieve logs.
In the dialog box, set the Time Range using the date and time pickers.
Under Filter Retrieval by Original Destination, choose one of the following options:
Must exclude: Search Cluster – Retrieves data only from the Data Lake (data not previously indexed).
Must include: Search Cluster – Retrieves data only from your search backend (previously indexed data).
Include All – Retrieves data from both sources.
(Optional) Add filters to further limit the log data retrieved:
Select Add filter, then select a field name and enter a filter value.
Add up to three filters and combine them using AND or OR logic.
Note
You cannot apply custom filters or queries beyond the available options. For additional information, refer to Data Lake Preview.
Select Retrieve to start the process.
As you configure your retrieval, an estimate appears at the bottom of the dialog box showing the approximate amount of data being searched. The retrieval process may take time depending on the data volume.
Once started, the retrieval job appears under Data Lake > Setup > Overview > Data Lake Jobs. When the retrieval is complete, your logs become available for search and analysis.
You can also initiate retrieval directly from the Data Lake Preview page after inspecting your data.
Once the retrieval is finished, you can view the restored logs by navigating to Streams > [your stream] > Data Routing > Destinations > Data Lake > Retrieval Operations.