IOStreams

IOStreams supports a consistent, streaming API for reading and writing files, regardless of whether the file is compressed, encrypted, local, or on a remote server.

Introduction

By using the IOStreams API, code can be written that processes files as if they were plain text and local.

In development files can be stored locally, whereas in production the files could be stored in AWS S3.

Additionally, the same code can transparently handle plain text, encrypted, or compressed files since IOStreams dynamically detects the file type, based on its extension(s).

For example one customer sends files in plain text, another as zip compressed, another using gzip, another using pgp, and yet another sends an xlsx file. Traditionally the code to process these files has to handle each of these file types on its own. IOStreams handles all these files types transparently.

Features

Low memory utilization, even when processing very large files.
Parse JSON, CSV, PSV, or fixed width data on the fly.
Encrypt / Decrypt data on the fly.
Compress / Decompress data on the fly.
Change storage location / mechanism transparently without any code changes.

Streaming avoids high memory utilization since the file (or other source such as AWS S3) is read or written a block at a time.

File Extensions

Zip
Gzip
BZip2
PGP (Requires GnuPG)
Xlsx (Reading)
Encryption using Symmetric Encryption

File Storage

File
AWS S3
Google Cloud Storage (Using the AWS S3 Client)
SFTP
HTTP(S) (Read only)

File formats

CSV
Fixed width formats
JSON
PSV

Example usages

Creating files

Write a string to a local file called sample.txt:

path = IOStreams.path("sample.txt")
path.write("Hello World")

Write a string to AWS S3, storing in the S3 bucket sample-bucket, under the path demo with a file name of sample.txt.

path = IOStreams.path("s3://sample-bucket/demo/sample.txt")
path.write("Hello World")

Write a string into a compressed file by adding the .gz extension to the file name:

path = IOStreams.path("sample.txt.gz")
path.write("Hello World")

Compress and encrypt the data into a PGP encrypted file, called sample.txt.pgp:

path = IOStreams.path("sample.txt.pgp")
# Recipient that can decrypt this file:
path.option(:pgp, recipient: "receiver@example.org")
path.write("Hello World")

Note: GnuPG needs to be installed locally for the above PGP example to work.

Write a string to a SFTP server, with a host name of example.org, under the path demo, encrypted with pgp, with a file name of sample.txt. Adds the optional username and password.

path = IOStreams.path("sftp://example.org/demo/sample.txt", 
                      username: "example", 
                      password: "topsecret")
path.write("Hello World")

Write a string to AWS S3, storing in the S3 bucket sample-bucket, under the path demo, encrypted with pgp, with a file name of sample.txt.pgp.

path = IOStreams.path("s3://sample-bucket/demo/sample.txt.pgp")
path.option(:pgp, recipient: "receiver@example.org")
path.write("Hello World")

Reading files

Read an entire local file called sample.txt, into a string:

path = IOStreams.path("sample.txt")
path.read
# => "Hello World"

Read an entire file called sample.txt, into a string, from the S3 bucket sample-bucket, under the path demo:

path = IOStreams.path("s3://sample-bucket/demo/sample.txt")
path.read
# => "Hello World"

Read an entire local file called sample.txt.gz, and decompress the contents into a string:

path = IOStreams.path("sample.txt.gz")
path.read
# => "Hello World"

Read an entire local file called sample.txt.pgp, decompress, and decrypt the contents into a string:

path = IOStreams.path("sample.txt.pgp")
path.read
# => "Hello World"

Notes:

GnuPG needs to be installed locally for the above PGP example to work.

Streaming Examples

When dealing with large files it is important not to load the entire file into memory. Efficiently read the files data in chunks / lines / records.

Read 128 characters at a time from a file:

path = IOStreams.path("sample.txt")
path.reader do |io|
  while (data = io.read(128))
    p data 
  end
end

Read one line at a time from the file:

path = IOStreams.path("sample.txt")
path.each do |line|
  puts line
end

Write data to the file.

path = IOStreams.path("sample.txt")
path.writer do |io|
  io << "This"
  io << " is "
  io << " one line\n"
end

Write lines to the file. By adding :line to writer, each write appends a new line character.

path = IOStreams.path("sample.txt")
path.writer(:line) do |file|
  file << "these"
  file << "are"
  file << "all"
  file << "separate"
  file << "lines"
end

Reading CSV Files

Example CSV file, example.csv:

name,address,zip_code
Jack,There,1234
Joe,Over There somewhere,1234

Read each line from the CSV file as lines of strings:

path = IOStreams.path("example.csv")
path.each do |line|
  p line
end

Output:

"name,address,zip_code"
"Jack,There,1234"
"Joe,Over There somewhere,1234"

Read each row from the CSV file as arrays:

path = IOStreams.path("example.csv")
path.each(:array) do |array|
  p array
end

Output:

["name", "address", "zip_code"]
["Jack", "There", "1234"]
["Joe", "Over There somewhere", "1234"]

Read each row from a csv file as key-value pairs, where the key is the CSV column header, and the value is the value for that row.

path = IOStreams.path("example.csv")
path.each(:hash) do |record|
  p record
end

Output:

{"name"=>"Jack", "address"=>"There", "zip_code"=>"1234"}
{"name"=>"Joe", "address"=>"Over There somewhere", "zip_code"=>"1234"}

Writing CSV Files

Write an array (row) at a time to the file. Each array is converted to csv before being written to the file.

IOStreams.path("example.csv").writer(:array) do |io|
  io << ["name", "address", "zip_code"]
  io << ["Jack", "There", "1234"]
  io << ["Joe", "Over There somewhere", 1234]
end

Write a hash (record) at a time to the file. Each hash is converted to csv before being written to the file. The header row is extracted from the first hash write that is performed.

path = IOStreams.path("example.csv")
path.writer(:hash) do |stream|
  stream << {name: "Jack", address: "There", zip_code: 1234}
  stream << {zip_code: 1234, address: "Over There somewhere", name: "Joe"}
end

This time write the CSV data to a compressed zip file, by adding .zip to the file name.

path = IOStreams.path("example.csv.zip")
path.writer(:hash) do |stream|
  stream << {name: "Jack", address: "There", zip_code: 1234}
  stream << {zip_code: 1234, address: "Over There somewhere", name: "Joe"}
end

Changing the file name to change its compression, encryption, or even whether it is local or remote has no effect on the code reading from or writing to the path.

PSV Files

PSV files are faster than CSV files, since CSV files have complex rules for dealing with embedded quotes and newlines.

PSV files in IOStreams follow the following simple rules:

Values are delimited using |.
Rows are delimeted with new lines.
Values may not contain |, or new lines.

Example PSV file, example.psv:

name|address|zip_code
Jack|There|1234
Joe|Over There somewhere|1234

Reading PSV Files

Read each row from a csv file as key-value pairs, where the key is the CSV column header, and the value is the value for that row.

path = IOStreams.path("example.psv")
path.each(:hash) do |record|
  p record
end

Output:

{"name"=>"Jack", "address"=>"There", "zip_code"=>"1234"}
{"name"=>"Joe", "address"=>"Over There somewhere", "zip_code"=>"1234"}

Writing PSV Files

Write a hash (record) at a time to the file. Each hash is converted to psv before being written to the file. The header row is extracted from the first hash write that is performed.

path = IOStreams.path("example.psv")
path.writer(:hash) do |stream|
  stream << {name: "Jack", address: "There", zip_code: 1234}
  stream << {zip_code: 1234, address: "Over There somewhere", name: "Joe"}
end

Getting Started

Start with the IOStreams tutorial for a great introduction to IOStreams.

IOStreams

Making changes to file formats, compression, encryption, or storage mechanism transparent.

IOStreams

Introduction

Features

File Extensions

File Storage

File formats

Example usages

Creating files

Reading files

Streaming Examples

Reading CSV Files

Writing CSV Files

PSV Files

Reading PSV Files

Writing PSV Files

Getting Started