API

S3FileSystem([anon, key, secret, token, ...]) Access S3 as if it were a file system.
S3FileSystem.cat(path) Returns contents of file
S3FileSystem.du(path[, total, deep]) Bytes in keys at path
S3FileSystem.exists(path) Does such a file/directory exist?
S3FileSystem.get(path, filename) Stream data from file at path to local filename
S3FileSystem.glob(path) Find files by glob-matching.
S3FileSystem.info(path[, refresh]) Detail on the specific file pointed to by path.
S3FileSystem.ls(path[, detail, refresh]) List single “directory” with or without details
S3FileSystem.mkdir(path[, acl]) Make new bucket or empty key
S3FileSystem.mv(path1, path2) Move file between locations on S3
S3FileSystem.open(path[, mode, block_size, ...]) Open a file for reading or writing
S3FileSystem.put(filename, path) Stream data from local filename to file at path
S3FileSystem.read_block(fn, offset, length) Read a block of bytes from an S3 file
S3FileSystem.rm(path[, recursive]) Remove keys and/or bucket.
S3FileSystem.tail(path[, size]) Return last bytes of file
S3FileSystem.touch(path[, acl]) Create empty key
S3File(s3, path[, mode, block_size, acl, ...]) Open S3 key as a file.
S3File.close() Close file
S3File.flush([force, retries]) Write buffered data to S3.
S3File.info() File information about this path
S3File.read([length]) Return data from cache, or fetch pieces as necessary
S3File.seek(loc[, whence]) Set current file location
S3File.tell() Current file location
S3File.write(data) Write data to buffer.
S3Map(root[, s3, check, create]) Wrap an S3FileSystem as a mutable wrapping.
class s3fs.core.S3FileSystem(anon=False, key=None, secret=None, token=None, use_ssl=True, client_kwargs=None, requester_pays=False, default_block_size=None, default_fill_cache=True, config_kwargs=None, **kwargs)[source]

Access S3 as if it were a file system.

This exposes a filesystem-like API (ls, cp, open, etc.) on top of S3 storage.

Provide credentials either explicitly (key=, secret=) or depend on boto’s credential methods. See boto3 documentation for more information. If no credentials are available, use anon=True.

Parameters:

anon : bool (False)

Whether to use anonymous connection (public buckets only). If False, uses the key/secret given, or boto’s credential resolver (environment variables, config files, EC2 IAM server, in that order)

key : string (None)

If not anonymous, use this access key ID, if specified

secret : string (None)

If not anonymous, use this secret access key, if specified

token : string (None)

If not anonymous, use this security token, if specified

use_ssl : bool (True)

Whether to use SSL in connections to S3; may be faster without, but insecure

client_kwargs : dict of parameters for the boto3 client

requester_pays : bool (False)

If RequesterPays buckets are supported.

default_block_size: None, int

If given, the default block size value used for open(), if no specific value is given at all time. The built-in default is 5MB.

default_fill_cache : Bool (True)

Whether to use cache filling with open by default. Refer to S3File.open.

config_kwargs : dict of parameters passed to botocore.client.Config

kwargs : other parameters for boto3 session

Examples

>>> s3 = S3FileSystem(anon=False)  
>>> s3.ls('my-bucket/')  
['my-file.txt']
>>> with s3.open('my-bucket/my-file.txt', mode='rb') as f:  
...     print(f.read())  
b'Hello, world!'

Methods

bulk_delete(pathlist) Remove multiple keys with one call
cat(path) Returns contents of file
chmod(path, acl) Set Access Control on a bucket/key
connect([refresh]) Establish S3 connection object.
copy(path1, path2) Copy file between locations on S3
current() Return the most recently created S3FileSystem
du(path[, total, deep]) Bytes in keys at path
exists(path) Does such a file/directory exist?
get(path, filename) Stream data from file at path to local filename
get_delegated_s3pars([exp]) Get temporary credentials from STS, appropriate for sending across a network.
getxattr(path, attr_name) Get an attribute from the metadata.
glob(path) Find files by glob-matching.
head(path[, size]) Return first bytes of file
info(path[, refresh]) Detail on the specific file pointed to by path.
invalidate_cache([path])
ls(path[, detail, refresh]) List single “directory” with or without details
merge(path, filelist) Create single S3 file from list of S3 files
metadata(path[, refresh]) Return metadata of path.
mkdir(path[, acl]) Make new bucket or empty key
mv(path1, path2) Move file between locations on S3
open(path[, mode, block_size, acl, fill_cache]) Open a file for reading or writing
put(filename, path) Stream data from local filename to file at path
read_block(fn, offset, length[, delimiter]) Read a block of bytes from an S3 file
rm(path[, recursive]) Remove keys and/or bucket.
rmdir(path) Remove empty key or bucket
setxattr(path, **kw_args) Set metadata.
tail(path[, size]) Return last bytes of file
touch(path[, acl]) Create empty key
url(path[, expires]) Generate presigned URL to access path by HTTP
walk(path[, refresh]) Return all real keys below path
bulk_delete(pathlist)[source]

Remove multiple keys with one call

Parameters:

pathlist : listof strings

The keys to remove, must all be in the same bucket.

cat(path)[source]

Returns contents of file

chmod(path, acl)[source]

Set Access Control on a bucket/key

See http://docs.aws.amazon.com/AmazonS3/latest/dev/acl-overview.html#canned-acl

Parameters:

path : string

the object to set

acl : string

the value of ACL to apply

connect(refresh=False)[source]

Establish S3 connection object.

Parameters:

refresh : bool (True)

Whether to use cached filelists, if already read

copy(path1, path2)[source]

Copy file between locations on S3

classmethod current()[source]

Return the most recently created S3FileSystem

If no S3FileSystem has been created, then create one

du(path, total=False, deep=False)[source]

Bytes in keys at path

exists(path)[source]

Does such a file/directory exist?

get(path, filename)[source]

Stream data from file at path to local filename

get_delegated_s3pars(exp=3600)[source]

Get temporary credentials from STS, appropriate for sending across a network. Only relevant where the key/secret were explicitly provided.

Parameters:

exp : int

Time in seconds that credentials are good for

Returns:

dict of parameters

getxattr(path, attr_name)[source]

Get an attribute from the metadata.

Examples

>>> mys3fs.getxattr('mykey', 'attribute_1')  
'value_1'
glob(path)[source]

Find files by glob-matching.

Note that the bucket part of the path must not contain a “*”

head(path, size=1024)[source]

Return first bytes of file

info(path, refresh=False)[source]

Detail on the specific file pointed to by path.

Gets details only for a specific key, directories/buckets cannot be used with info.

ls(path, detail=False, refresh=False)[source]

List single “directory” with or without details

merge(path, filelist)[source]

Create single S3 file from list of S3 files

Uses multi-part, no data is downloaded. The original files are not deleted.

Parameters:

path : str

The final file to produce

filelist : list of str

The paths, in order, to assemble into the final file.

metadata(path, refresh=False)[source]

Return metadata of path.

Metadata is cached unless refresh=True.

Parameters:

path : string/bytes

filename to get metadata for

refresh : bool (=False)

if False, look in local cache for file metadata first

mkdir(path, acl='')[source]

Make new bucket or empty key

mv(path1, path2)[source]

Move file between locations on S3

open(path, mode='rb', block_size=None, acl='', fill_cache=None)[source]

Open a file for reading or writing

Parameters:

path: string

Path of file on S3

mode: string

One of ‘rb’ or ‘wb’

block_size: int

Size of data-node blocks if reading

fill_cache: bool

If seeking to new a part of the file beyond the current buffer, with this True, the buffer will be filled between the sections to best support random access. When reading only a few specific chunks out of a file, performance may be better if False.

put(filename, path)[source]

Stream data from local filename to file at path

read_block(fn, offset, length, delimiter=None)[source]

Read a block of bytes from an S3 file

Starting at offset of the file, read length bytes. If delimiter is set then we ensure that the read starts and stops at delimiter boundaries that follow the locations offset and offset + length. If offset is zero then we start at zero. The bytestring returned WILL include the end delimiter string.

If offset+length is beyond the eof, reads to eof.

Parameters:

fn: string

Path to filename on S3

offset: int

Byte offset to start read

length: int

Number of bytes to read

delimiter: bytes (optional)

Ensure reading starts and stops at delimiter bytestring

See also

distributed.utils.read_block

Examples

>>> s3.read_block('data/file.csv', 0, 13)  
b'Alice, 100\nBo'
>>> s3.read_block('data/file.csv', 0, 13, delimiter=b'\n')  
b'Alice, 100\nBob, 200\n'

Use length=None to read to the end of the file. >>> s3.read_block(‘data/file.csv’, 0, None, delimiter=b’n’) # doctest: +SKIP b’Alice, 100nBob, 200nCharlie, 300’

rm(path, recursive=False)[source]

Remove keys and/or bucket.

Parameters:

path : string

The location to remove.

recursive : bool (True)

Whether to remove also all entries below, i.e., which are returned by walk().

rmdir(path)[source]

Remove empty key or bucket

setxattr(path, **kw_args)[source]

Set metadata.

Attributes have to be of the form documented in the `Metadata Reference`_.

Parameters: key-value pairs like field=”value”, where the values must be strings. Does not alter existing fields, unless the field appears here - if the value is None, delete the field.

Examples

>>> mys3file.setxattr(attribute_1='value1', attribute_2='value2')  

http://docs.aws.amazon.com/AmazonS3/latest/dev/UsingMetadata.html#object-metadata

tail(path, size=1024)[source]

Return last bytes of file

touch(path, acl='')[source]

Create empty key

If path is a bucket only, attempt to create bucket.

url(path, expires=3600)[source]

Generate presigned URL to access path by HTTP

Parameters:

path : string

the key path we are interested in

expires : int

the number of seconds this signature will be good for.

walk(path, refresh=False)[source]

Return all real keys below path

class s3fs.core.S3File(s3, path, mode='rb', block_size=5242880, acl='', fill_cache=True)[source]

Open S3 key as a file. Data is only loaded and cached on demand.

Parameters:

s3 : boto3 connection

bucket : string

S3 bucket to access

key : string

S3 key to access

blocksize : int

read-ahead size for finding delimiters

fill_cache: bool

If seeking to new a part of the file beyond the current buffer, with this True, the buffer will be filled between the sections to best support random access. When reading only a few specific chunks out of a file, performance may be better if False.

See also

S3FileSystem.open
used to create S3File objects

Examples

>>> s3 = S3FileSystem()  
>>> with s3.open('my-bucket/my-file.txt', mode='rb') as f:  
...     ...  

Methods

close() Close file
flush([force, retries]) Write buffered data to S3.
getxattr(xattr_name) Get an attribute from the metadata.
info() File information about this path
metadata([refresh]) Return metadata of file.
next()
read([length]) Return data from cache, or fetch pieces as necessary
readable() Return whether the S3File was opened for reading
readline([length]) Read and return a line from the stream.
readlines() Return all lines in a file as a list
seek(loc[, whence]) Set current file location
seekable() Return whether the S3File is seekable (only in read mode)
setxattr(**kwargs) Set metadata.
tell() Current file location
url() HTTP URL to read this file (if it already exists)
writable() Return whether the S3File was opened for writing
write(data) Write data to buffer.
close()[source]

Close file

If in write mode, key is only finalized upon close, and key will then be available to other processes.

flush(force=False, retries=10)[source]

Write buffered data to S3.

Uploads the current buffer, if it is larger than the block-size.

Due to S3 multi-upload policy, you can only safely force flush to S3 when you are finished writing. It is unsafe to call this function repeatedly.

Parameters:

force : bool

When closing, write the last block even if it is smaller than blocks are allowed to be.

getxattr(xattr_name)[source]

Get an attribute from the metadata. See getxattr().

Examples

>>> mys3file.getxattr('attribute_1')  
'value_1'
info()[source]

File information about this path

metadata(refresh=False)[source]

Return metadata of file. See metadata().

Metadata is cached unless refresh=True.

read(length=-1)[source]

Return data from cache, or fetch pieces as necessary

Parameters:

length : int (-1)

Number of bytes to read; if <0, all remaining bytes.

readable()[source]

Return whether the S3File was opened for reading

readline(length=-1)[source]

Read and return a line from the stream.

If length is specified, at most size bytes will be read.

readlines()[source]

Return all lines in a file as a list

seek(loc, whence=0)[source]

Set current file location

Parameters:

loc : int

byte location

whence : {0, 1, 2}

from start of file, current location or end of file, resp.

seekable()[source]

Return whether the S3File is seekable (only in read mode)

setxattr(**kwargs)[source]

Set metadata. See setxattr().

Examples

>>> mys3file.setxattr(attribute_1='value1', attribute_2='value2')  
tell()[source]

Current file location

url()[source]

HTTP URL to read this file (if it already exists)

writable()[source]

Return whether the S3File was opened for writing

write(data)[source]

Write data to buffer.

Buffer only sent to S3 on flush() or if buffer is greater than or equal to blocksize.

Parameters:

data : bytes

Set of bytes to be written.

class s3fs.mapping.S3Map(root, s3=None, check=False, create=False)[source]

Wrap an S3FileSystem as a mutable wrapping.

The keys of the mapping become files under the given root, and the values (which must be bytes) the contents of those files.

Parameters:

root : string

prefix for all the files (perhaps just a bucket name)

s3 : S3FileSystem

check : bool (=True)

performs a touch at the location, to check writeability.

Examples

>>> s3 = s3fs.S3FileSystem() 
>>> d = MapWrapping('mybucket/mapstore/', s3=s3) 
>>> d['loc1'] = b'Hello World' 
>>> list(d.keys()) 
['loc1']
>>> d['loc1'] 
b'Hello World'

Methods

clear() Remove all keys below root - empties out mapping
get((k[,d]) -> D[k] if k in D, ...)
items(...)
keys()
pop((k[,d]) -> v, ...) If key is not found, d is returned if given, otherwise KeyError is raised.
popitem(() -> (k, v), ...) as a 2-tuple; but raise KeyError if D is empty.
setdefault((k[,d]) -> D.get(k,d), ...)
update(([E, ...) If E present and has a .keys() method, does: for k in E: D[k] = E[k]
values(...)