Tuesday, July 16, 2024

Use SeaweedFS with Apache jclouds

 Prior to this, there was very sparse documentation linking these two software. It might be common sense to some, but there was hardly any mention for setting up both to be used in tandem. So let's cut to the chase.

The main draw for Apache jclouds is support for S3 API in Java, across many platforms. The main concern for us in particular, was their BlobStore API.

Addition to pom.xml

<jclouds.version>2.6.0</jclouds.version>


<dependency>

        <groupId>org.apache.jclouds</groupId>

        <artifactId>jclouds-all</artifactId>

        <version>${jclouds.version}</version>

</dependency>

Code snippet

//Initialise connectivity 

BlobStoreContext context = ContextBuilder.newBuilder("s3")

    .credentials(identity, credential)

    .endpoint(weedMasterUrl)

        .buildView(BlobStoreContext.class);

// Access the BlobStore

BlobStore blobStore = context.getBlobStore(); 

ByteSource payload = ByteSource.wrap(payloadStr.getBytes("UTF-8"));

Blob blob = blobStore.blobBuilder(uuid)

    .payload(payload)

    .contentLength(payload.size())

    .build();


// Upload the Blob

blobStore.putBlob(containerName, blob);


// Don't forget to close the context when you're done!

context.close();


The above was practically lifted off the jclouds page. The specific point of attention would be the newBuilder("s3") that is used as a generic version of the "aws-s3" stated in their original sample.


"But SeaweedFS already has a large number of client libraries provided by the community!", you exclaimed. 

And you'd be correct. Yet they'd only be used specifically for SeaweedFS however. I'd neglected to elaborate earlier, that the S3 API offerd by jclouds is generically usable with any other (enterprise-grade) product besides SeaweedFS. By integrating the two, our development can adopt a lightweight alternative like SeaweedFS, while the main production deployment takes on a heftier software, all while using the same library, which is offered by Apache no less.


This was still largely unexplored territory for some of us, so setting up SeaweedFS was more nuanced than we expected. 

This is what it took:

  1. Start the master server: sudo ./weed master &
  2. Start a volume server: sudo ./weed volume -dir=”/data/seaweed/data1” -max=5 -mserver=”localhost:9333” -port=8080 &
  3. Start filer and S3 service: sudo ./weed filer -s3 &
There was a gotcha in there that I had to figure out on my own. The Getting Started page only mentioned starting the master and volume. You could even interact with the service using cURL once it's started. But the Java codes still couldn't talk to seaweed.

The wiki even had a section describing how to use s3cmd to communicate with seaweed. I just didn't get it. Until I found this article that vaguely mentioning that the server needed to be started up specifically with s3 as an option.

I had to return to the ./weed --help to get more clues. On hindsight, none of this would have been an issue had I ran the server option instead of running the master and volume processes separately. But I felt it necessary to adopt a structure we might expect eventually. 

And with the filer -s3 started, the s3cmd could be made at last. I reckon I'd have been still scratching my head, had I not adopted s3cmd for verifying the setup outside of the codes. I could make my bucket and be on my way at last (because I think that our actual codes probably shouldn't be creating buckets on its own that easily).