Guides Java SDK development in Apache Beam, including building, testing, running examples, and understanding the project structure. Use when working with Java code in sdks/java/, runners/, or examples/java/.
Install
npx skillscat add apache/beam/java-development Install via the SkillsCat registry.
SKILL.md
Java Development in Apache Beam
Project Structure
Key Directories
sdks/java/core- Core Java SDK (PCollection, PTransform, Pipeline)sdks/java/harness- SDK harness (container entrypoint)sdks/java/io/- I/O connectors (51+ connectors including BigQuery, Kafka, JDBC, etc.)sdks/java/extensions/- Extensions (SQL, ML, protobuf, etc.)runners/- Runner implementations:runners/direct-java- Direct Runner (local execution)runners/flink/- Flink Runnerrunners/spark/- Spark Runnerrunners/google-cloud-dataflow-java/- Dataflow Runner
examples/java/- Java examples including WordCount
Build System
Apache Beam uses Gradle with a custom BeamModulePlugin. Every Java project's build.gradle starts with:
apply plugin: 'org.apache.beam.module'
applyJavaNature( ... )Common Commands
Build Commands
# Compile a specific project
./gradlew -p sdks/java/core compileJava
# Build a project (compile + tests)
./gradlew :sdks:java:harness:build
# Run WordCount example
./gradlew :examples:java:wordCountRunning Unit Tests
# Run all tests in a project
./gradlew :sdks:java:harness:test
# Run a specific test class
./gradlew :sdks:java:harness:test --tests org.apache.beam.fn.harness.CachesTest
# Run tests matching a pattern
./gradlew :sdks:java:harness:test --tests *CachesTest
# Run a specific test method
./gradlew :sdks:java:harness:test --tests *CachesTest.testClearableCacheRunning Integration Tests
Integration tests have filenames ending in IT.java and use TestPipeline.
# Run I/O integration tests on Direct Runner
./gradlew :sdks:java:io:google-cloud-platform:integrationTest
# Run with custom GCP project
./gradlew :sdks:java:io:google-cloud-platform:integrationTest \
-PgcpProject=<project> -PgcpTempRoot=gs://<bucket>/path
# Run on Dataflow Runner
./gradlew :runners:google-cloud-dataflow-java:examplesJavaRunnerV2IntegrationTest \
-PdisableSpotlessCheck=true -PdisableCheckStyle=true -PskipCheckerFramework \
-PgcpProject=<project> -PgcpRegion=us-central1 -PgcsTempRoot=gs://<bucket>/tmpCode Formatting
# Format Java code
./gradlew spotlessApplyWriting Integration Tests
@Rule public TestPipeline pipeline = TestPipeline.create();
@Test
public void testSomething() {
pipeline.apply(...);
pipeline.run().waitUntilFinish();
}Set pipeline options via -DbeamTestPipelineOptions='[...]':
-DbeamTestPipelineOptions='["--runner=TestDataflowRunner","--project=myproject","--region=us-central1","--stagingLocation=gs://bucket/path"]'Using Modified Beam Code
Publish to Maven Local
# Publish a specific module
./gradlew -Ppublishing -p sdks/java/io/kafka publishToMavenLocal
# Publish all modules
./gradlew -Ppublishing publishToMavenLocalBuilding SDK Container
# Build Java SDK container (for Runner v2)
./gradlew :sdks:java:container:java11:docker
# Tag and push
docker tag apache/beam_java11_sdk:2.XX.0.dev \
"us-docker.pkg.dev/your-project/beam/beam_java11_sdk:custom"
docker push "us-docker.pkg.dev/your-project/beam/beam_java11_sdk:custom"Building Dataflow Worker Jar
./gradlew :runners:google-cloud-dataflow-java:worker:shadowJarTest Naming Conventions
- Unit tests:
*Test.java - Integration tests:
*IT.java
JUnit Report Location
After running tests, find HTML reports at:<project>/build/reports/tests/test/index.html
IDE Setup (IntelliJ)
- Open
/beam(the repository root, NOTsdks/java) - Wait for indexing to complete
- Find
examples/java/build.gradleand click Run next to wordCount task to verify setup