Skip to content

[feature](workload) Support remote scan bytes breaker in workload policy#64649

Open
wenzhenghu wants to merge 3 commits into
apache:masterfrom
wenzhenghu:feature/workload-policy-remote-read-bytes
Open

[feature](workload) Support remote scan bytes breaker in workload policy#64649
wenzhenghu wants to merge 3 commits into
apache:masterfrom
wenzhenghu:feature/workload-policy-remote-read-bytes

Conversation

@wenzhenghu

Copy link
Copy Markdown
Contributor

What problem does this PR solve?

Issue Number: None

Related PR: None

Problem Summary:

This PR adds a new workload policy condition be_scan_bytes_from_remote_storage, which allows Doris to cancel queries according to the amount of data read from remote storage by BE scan tasks. This is useful for limiting external table queries that read too much remote HDFS or object storage data.

Implementation summary:

  • Add a new BE-side workload metric type in thrift for remote storage scan bytes.
  • Add FE workload policy parsing, validation, metadata mapping, and replay support for be_scan_bytes_from_remote_storage.
  • Add BE workload condition evaluation based on io_context()->scan_bytes_from_remote_storage().
  • Add regression coverage using an existing Hive external lineitem table.

Release note

Support workload policy cancellation by BE remote storage scan bytes.

Check List (For Author)

  • Test:
    • FE UT: passed
    • BE UT: passed
    • Regression test: passed, test_workload_policy_remote_scan_bytes
    • Manual test: verified existing workload policy behavior and new remote scan bytes cancellation on a deployed Doris instance
  • Behavior changed: Yes. Add a new workload policy condition be_scan_bytes_from_remote_storage.
  • Does this need documentation: Yes. The workload policy condition list should be updated.

### What problem does this PR solve?

Issue Number: None

Related PR: None

Problem Summary: Add a Hive external table regression test for workload policy cancellation by BE-side remote storage scan bytes. The test creates an HMS catalog, chooses an existing lineitem table from tpch1_parquet or tpch1, creates a workload policy with be_scan_bytes_from_remote_storage, and verifies that the query is cancelled with the remote scan bytes counter in the error message.

### Release note

None

### Check List (For Author)

- Test: Regression test / syntax check
    - Ran test_workload_policy_remote_scan_bytes against 172.16.0.90:9036
    - Ran Groovy FileSystemCompiler syntax check
    - Ran git diff --cached --check
- Behavior changed: No
- Does this need documentation: No
@hello-stephen

Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@wenzhenghu

Copy link
Copy Markdown
Contributor Author

run buildall

@hello-stephen

Copy link
Copy Markdown
Contributor

Cloud UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 77.32% (1889/2443)
Line Coverage 64.43% (33969/52726)
Region Coverage 64.78% (17457/26948)
Branch Coverage 53.95% (9342/17316)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants