Path

A path describes the data store and the attributes for the file to be stored there. In order to apply a streaming pipeline it needs to know where the data is being stored and how it should be accessed.

When a path is created it takes the name of the file which can also be a URI, followed by several arguments specific to that path. IOStreams will infer the file storage mechanism based on the supplied URI.

IOStreams Path already supports accessing files in the following places:

Are you using another cloud provider and want to add support for your favorite? Checkout the supplied IOStreams S3 path provider for an example of what is required. Good luck with the Pull Request and let us know if you have any questions in Gitter chat

File

The simplest case is a file on the local disk:

path = IOStreams.path("somewhere/example.csv")

Optional Arguments:

AWS S3 (s3://)

If the supplied file name string includes a URI. For example if AWS is configured locally:

path = IOStreams.path("s3://bucket-name/path/example.csv")

Required Arguments:

Optional Arguments:

Writer specific options:

SFTP (sftp://)

If the supplied file name string includes the sftp URI.

path = IOStreams.path("sftp://hostname/path/example.csv")

Read a file from a remote sftp server.

IOStreams.path("sftp://example.org/path/file.txt", 
               username: "jbloggs", 
               password: "secret", 
               compression: false).
  reader do |input|
    puts input.read
  end

Raises Net::SFTP::StatusException when the file could not be read.

Write to a file on a remote sftp server.

IOStreams.path("sftp://example.org/path/file.txt", 
               username: "jbloggs", 
               password: "secret", 
               compression: false).
  writer do |output|
    output.write('Hello World')
  end

Display the contents of a remote file, supplying the username and password in the url:

IOStreams.path("sftp://jack:OpenSesame@test.com:22/path/file_name.csv").reader do |io|
  puts io.read
end

Use an identity file instead of a password to authenticate:

path = IOStreams.path("sftp://test.com/path/file_name.csv", 
                      username: "jack", 
                      ssh_options: {IdentityFile: "~/.ssh/private_key"}).
path.reader do |io|
  puts io.read
end

Pass in the IdentityKey itself instead of a password to authenticate. For example, retrieve the identity key stored in Secret Config:

identity_key = SecretConfig.fetch("suppliers/sftp/identity_key")

path = IOStreams.path("sftp://test.com/path/file_name.csv", 
                      username: "jack", 
                      ssh_options: {IdentityKey: identity_key})
path.reader do |io|
  puts io.read
end

Required Arguments:

Optional Arguments:

**ssh_options Any other options supported by ssh_config. man ssh_config to see all available options.

HTTP (http://, https://)

Read from a remote file over HTTP or HTTPS using an HTTP Get.

IOStreams.path('https://www5.fdic.gov/idasp/Offices2.zip').read

Notes:

Required Arguments:

Optional Arguments:

path = IOStreams.path("http://hostname/path/example.csv")

Similarly when using https:

path = IOStreams.path("https://hostname/path/example.csv")

This time IOStreams inferred that the file lives on an HTTP Server and returns IOStreams::Paths::HTTP.

Using root paths

If root paths have been setup, see Config to add root paths, then IOStreams.join can be used instead of IOStreams.path.

The key difference is that IOStreams.join joins the supplied path(s) with the default or named root path so that the entire path does not need to be supplied.

Set the default root path in an initializer.

IOStreams.add_root(:default, "/var/my_app/files")

The following code:

path = IOStreams.path("/var/my_app/files", "sample", "example.csv", root: :uploads)
path.writer(:line) do |io|
  io << "Welcome"
  io << "To IOStreams"
end

Can be reduced to:

path = IOStreams.join("sample", "example.csv", root: :uploads)
path.writer(:line) do |io|
  io << "Welcome"
  io << "To IOStreams"
end

Most importantly the root path information and storage mechanism are externalized from the application code.

For example, to make the above code write to S3, change the initializer to:

IOStreams.add_root(:default, "s3://my-app-bucket-name/files")