API¶
S3FileSystem ([anon, key, secret, token, …]) |
Access S3 as if it were a file system. |
S3FileSystem.cat (path[, recursive, on_error]) |
Fetch (potentially multiple) paths’ contents |
S3FileSystem.du (path[, total, maxdepth]) |
Space used by files within a path |
S3FileSystem.exists (path) |
|
S3FileSystem.get (rpath, lpath[, recursive]) |
Copy file(s) to local. |
S3FileSystem.glob (path, **kwargs) |
Find files by glob-matching. |
S3FileSystem.info (path[, version_id, refresh]) |
Give details of entry at path |
S3FileSystem.ls (path[, detail, refresh]) |
List single “directory” with or without details |
S3FileSystem.mkdir (path[, acl, create_parents]) |
|
S3FileSystem.mv (path1, path2[, recursive, …]) |
Move file(s) from one location to another |
S3FileSystem.open (path[, mode, block_size, …]) |
Return a file-like object from the filesystem |
S3FileSystem.put (lpath, rpath[, recursive]) |
Copy file(s) from local. |
S3FileSystem.read_block (fn, offset, length) |
Read a block of bytes from |
S3FileSystem.rm (path[, recursive]) |
Delete files. |
S3FileSystem.tail (path[, size]) |
Get the last size bytes from file |
S3FileSystem.touch (path[, truncate, data]) |
Create empty file or truncate |
S3File (s3, path[, mode, block_size, acl, …]) |
Open S3 key as a file. |
S3File.close () |
Close file |
S3File.flush ([force]) |
Write buffered data to backend store. |
S3File.info () |
File information about this path |
S3File.read ([length]) |
Return data from cache, or fetch pieces as necessary |
S3File.seek (loc[, whence]) |
Set current file location |
S3File.tell () |
Current file location |
S3File.write (data) |
Write data to buffer. |
S3Map (root, s3[, check, create]) |
Mirror previous class, not implemented in fsspec |
-
class
s3fs.core.
S3FileSystem
(anon=False, key=None, secret=None, token=None, use_ssl=True, client_kwargs=None, requester_pays=False, default_block_size=None, default_fill_cache=True, default_cache_type='bytes', version_aware=False, config_kwargs=None, s3_additional_kwargs=None, session=None, username=None, password=None, asynchronous=False, loop=None, **kwargs)[source]¶ Access S3 as if it were a file system.
This exposes a filesystem-like API (ls, cp, open, etc.) on top of S3 storage.
Provide credentials either explicitly (
key=
,secret=
) or depend on boto’s credential methods. See botocore documentation for more information. If no credentials are available, useanon=True
.Parameters: - anon : bool (False)
Whether to use anonymous connection (public buckets only). If False, uses the key/secret given, or boto’s credential resolver (client_kwargs, environment, variables, config files, EC2 IAM server, in that order)
- key : string (None)
If not anonymous, use this access key ID, if specified
- secret : string (None)
If not anonymous, use this secret access key, if specified
- token : string (None)
If not anonymous, use this security token, if specified
- use_ssl : bool (True)
Whether to use SSL in connections to S3; may be faster without, but insecure. If
use_ssl
is also set inclient_kwargs
, the value set inclient_kwargs
will take priority.- s3_additional_kwargs : dict of parameters that are used when calling s3 api
methods. Typically used for things like “ServerSideEncryption”.
- client_kwargs : dict of parameters for the botocore client
- requester_pays : bool (False)
If RequesterPays buckets are supported.
- default_block_size: int (None)
If given, the default block size value used for
open()
, if no specific value is given at all time. The built-in default is 5MB.- default_fill_cache : Bool (True)
Whether to use cache filling with open by default. Refer to
S3File.open
.- default_cache_type : string (‘bytes’)
If given, the default cache_type value used for
open()
. Set to “none” if no caching is desired. See fsspec’s documentation for other available cache_type values. Default cache_type is ‘bytes’.- version_aware : bool (False)
Whether to support bucket versioning. If enable this will require the user to have the necessary IAM permissions for dealing with versioned objects.
- config_kwargs : dict of parameters passed to
botocore.client.Config
- kwargs : other parameters for core session
- session : aiobotocore AioSession object to be used for all connections.
This session will be used inplace of creating a new session inside S3FileSystem. For example: aiobotocore.AioSession(profile=’test_user’)
- The following parameters are passed on to fsspec:
- skip_instance_cache: to control reuse of instances
- use_listings_cache, listings_expiry_time, max_paths: to control reuse of directory listings
Examples
>>> s3 = S3FileSystem(anon=False) # doctest: +SKIP >>> s3.ls('my-bucket/') # doctest: +SKIP ['my-file.txt']
>>> with s3.open('my-bucket/my-file.txt', mode='rb') as f: # doctest: +SKIP ... print(f.read()) # doctest: +SKIP b'Hello, world!'
Attributes: - s3
transaction
A context within which files are committed together upon exit
Methods
cat
(path[, recursive, on_error])Fetch (potentially multiple) paths’ contents cat_file
(path[, start, end])Get the content of a file checksum
(path[, refresh])Unique value for current version of file chmod
(path, acl, **kwargs)Set Access Control on a bucket/key clear_instance_cache
()Clear the cache of filesystem instances. connect
([kwargs])Establish S3 connection object. copy
(path1, path2[, recursive, on_error])Copy within two locations in the filesystem cp
(path1, path2, **kwargs)Alias of FilesystemSpec.copy. created
(path)Return the created timestamp of a file as a datetime.datetime current
()Return the most recently created FileSystem delete
(path[, recursive, maxdepth])Alias of FilesystemSpec.rm. disk_usage
(path[, total, maxdepth])Alias of FilesystemSpec.du. download
(rpath, lpath[, recursive])Alias of FilesystemSpec.get. du
(path[, total, maxdepth])Space used by files within a path end_transaction
()Finish write transaction, non-context version expand_path
(path[, recursive, maxdepth])Turn one or more globs or directories into a list of all matching paths to files or directories. from_json
(blob)Recreate a filesystem instance from JSON representation get
(rpath, lpath[, recursive])Copy file(s) to local. get_delegated_s3pars
([exp])Get temporary credentials from STS, appropriate for sending across a network. get_file
(rpath, lpath, **kwargs)Copy single remote file to local get_mapper
(root[, check, create])Create key/value store based on this file-system get_tags
(path)Retrieve tag key/values for the given path getxattr
(path, attr_name, **kwargs)Get an attribute from the metadata. glob
(path, **kwargs)Find files by glob-matching. head
(path[, size])Get the first size
bytes from fileinfo
(path[, version_id, refresh])Give details of entry at path invalidate_cache
([path])Discard any cached directory information isdir
(path)Is this entry directory-like? isfile
(path)Is this entry file-like? listdir
(path[, detail])Alias of FilesystemSpec.ls. ls
(path[, detail, refresh])List single “directory” with or without details makedir
(path[, create_parents])Alias of FilesystemSpec.mkdir. makedirs
(path[, exist_ok])Recursively make directories merge
(path, filelist, **kwargs)Create single S3 file from list of S3 files metadata
(path[, refresh])Return metadata of path. mkdirs
(path[, exist_ok])Alias of FilesystemSpec.makedirs. modified
(path[, version_id, refresh])Return the last modified timestamp of file at path as a datetime move
(path1, path2, **kwargs)Alias of FilesystemSpec.mv. mv
(path1, path2[, recursive, maxdepth])Move file(s) from one location to another open
(path[, mode, block_size, cache_options])Return a file-like object from the filesystem pipe
(path[, value])Put value into path pipe_file
(path, value, **kwargs)Set the bytes of given file put
(lpath, rpath[, recursive])Copy file(s) from local. put_file
(lpath, rpath, **kwargs)Copy single file to remote put_tags
(path, tags[, mode])Set tags for given existing key read_block
(fn, offset, length[, delimiter])Read a block of bytes from rename
(path1, path2, **kwargs)Alias of FilesystemSpec.mv. rm
(path[, recursive])Delete files. rm_file
(path)Delete a file setxattr
(path[, copy_kwargs])Set metadata. sign
(path[, expiration])Create a signed URL representing the given path size
(path)Size in bytes of file split_path
(path)Normalise S3 path string into bucket and key. start_transaction
()Begin write transaction for deferring files, non-context version stat
(path, **kwargs)Alias of FilesystemSpec.info. tail
(path[, size])Get the last size
bytes from fileto_json
()JSON representation of this filesystem instance touch
(path[, truncate, data])Create empty file or truncate ukey
(path)Hash of file properties, to tell if it has changed upload
(lpath, rpath[, recursive])Alias of FilesystemSpec.put. url
(path[, expires])Generate presigned URL to access path by HTTP walk
(path[, maxdepth])Return all files belows path call_s3 cp_file exists find is_bucket_versioned mkdir object_version_info rmdir -
checksum
(path, refresh=False)[source]¶ Unique value for current version of file
If the checksum is the same from one moment to another, the contents are guaranteed to be the same. If the checksum changes, the contents might have changed.
Parameters: - path : string/bytes
path of file to get checksum for
- refresh : bool (=False)
if False, look in local cache for file details first
-
chmod
(path, acl, **kwargs)[source]¶ Set Access Control on a bucket/key
See http://docs.aws.amazon.com/AmazonS3/latest/dev/acl-overview.html#canned-acl
Parameters: - path : string
the object to set
- acl : string
the value of ACL to apply
-
connect
(kwargs={})¶ Establish S3 connection object.
-
get_delegated_s3pars
(exp=3600)[source]¶ Get temporary credentials from STS, appropriate for sending across a network. Only relevant where the key/secret were explicitly provided.
Parameters: - exp : int
Time in seconds that credentials are good for
Returns: - dict of parameters
Retrieve tag key/values for the given path
Returns: - {str: str}
-
getxattr
(path, attr_name, **kwargs)[source]¶ Get an attribute from the metadata.
Examples
>>> mys3fs.getxattr('mykey', 'attribute_1') # doctest: +SKIP 'value_1'
-
info
(path, version_id=None, refresh=False)[source]¶ Give details of entry at path
Returns a single dictionary, with exactly the same information as
ls
would withdetail=True
.The default implementation should calls ls and could be overridden by a shortcut. kwargs are passed on to
`ls()
.Some file systems might not be able to measure the file’s size, in which case, the returned dict will include
'size': None
.Returns: - dict with keys: name (full path in the FS), size (in bytes), type (file,
- directory, or something else) and other FS-specific keys.
-
invalidate_cache
(path=None)[source]¶ Discard any cached directory information
Parameters: - path: string or None
If None, clear all listings cached else listings at or under given path.
-
ls
(path, detail=False, refresh=False, **kwargs)[source]¶ List single “directory” with or without details
Parameters: - path : string/bytes
location at which to list files
- detail : bool (=True)
if True, each list item is a dict of file properties; otherwise, returns list of filenames
- refresh : bool (=False)
if False, look in local cache for file details first
- kwargs : dict
additional arguments passed on
-
makedirs
(path, exist_ok=False)[source]¶ Recursively make directories
Creates directory at path and any intervening required directories. Raises exception if, for instance, the path already exists but is a file.
Parameters: - path: str
leaf directory name
- exist_ok: bool (False)
If True, will error if the target already exists
-
merge
(path, filelist, **kwargs)¶ Create single S3 file from list of S3 files
Uses multi-part, no data is downloaded. The original files are not deleted.
Parameters: - path : str
The final file to produce
- filelist : list of str
The paths, in order, to assemble into the final file.
-
metadata
(path, refresh=False, **kwargs)[source]¶ Return metadata of path.
Metadata is cached unless refresh=True.
Parameters: - path : string/bytes
filename to get metadata for
- refresh : bool (=False)
if False, look in local cache for file metadata first
-
modified
(path, version_id=None, refresh=False)[source]¶ Return the last modified timestamp of file at path as a datetime
Set tags for given existing key
Tags are a str:str mapping that can be attached to any key, see https://docs.aws.amazon.com/awsaccountbilling/latest/aboutv2/allocation-tag-restrictions.html
This is similar to, but distinct from, key metadata, which is usually set at key creation time.
Parameters: - path: str
Existing key to attach tags to
- tags: dict str, str
Tags to apply.
- mode:
One of ‘o’ or ‘m’ ‘o’: Will over-write any existing tags. ‘m’: Will merge in new tags with existing tags. Incurs two remote calls.
-
rm
(path, recursive=False, **kwargs)[source]¶ Delete files.
Parameters: - path: str or list of str
File(s) to delete.
- recursive: bool
If file(s) are directories, recursively delete contents and then also remove the directory
- maxdepth: int or None
Depth to pass to walk for finding files to delete, if recursive. If None, there will be no limit and infinite recursion may be possible.
-
setxattr
(path, copy_kwargs=None, **kw_args)[source]¶ Set metadata.
Attributes have to be of the form documented in the `Metadata Reference`_.
Parameters: - kw_args : key-value pairs like field=”value”, where the values must be
strings. Does not alter existing fields, unless the field appears here - if the value is None, delete the field.
- copy_kwargs : dict, optional
dictionary of additional params to use for the underlying s3.copy_object.
Examples
>>> mys3file.setxattr(attribute_1='value1', attribute_2='value2') # doctest: +SKIP # Example for use with copy_args >>> mys3file.setxattr(copy_kwargs={'ContentType': 'application/pdf'}, ... attribute_1='value1') # doctest: +SKIP
http://docs.aws.amazon.com/AmazonS3/latest/dev/UsingMetadata.html#object-metadata
-
sign
(path, expiration=100, **kwargs)[source]¶ Create a signed URL representing the given path
Some implementations allow temporary URLs to be generated, as a way of delegating credentials.
Parameters: - path : str
The path on the filesystem
- expiration : int
Number of seconds to enable the URL for (if supported)
Returns: - URL : str
The signed URL
Raises: - NotImplementedError : if method is not implemented for a fileystem
-
split_path
(path) → Tuple[str, str, Optional[str]][source]¶ Normalise S3 path string into bucket and key.
Parameters: - path : string
Input path, like s3://mybucket/path/to/file
Examples
>>> split_path("s3://mybucket/path/to/file") ['mybucket', 'path/to/file', None]
>>> split_path("s3://mybucket/path/to/versioned_file?versionId=some_version_id") ['mybucket', 'path/to/versioned_file', 'some_version_id']
-
url
(path, expires=3600, **kwargs)[source]¶ Generate presigned URL to access path by HTTP
Parameters: - path : string
the key path we are interested in
- expires : int
the number of seconds this signature will be good for.
-
walk
(path, maxdepth=None, **kwargs)[source]¶ Return all files belows path
List all files, recursing into subdirectories; output is iterator-style, like
os.walk()
. For a simple list of files,find()
is available.Note that the “files” outputted will include anything that is not a directory, such as links.
Parameters: - path: str
Root to recurse into
- maxdepth: int
Maximum recursion depth. None means limitless, but not recommended on link-based file-systems.
- kwargs: passed to ``ls``
-
class
s3fs.core.
S3File
(s3, path, mode='rb', block_size=5242880, acl='', version_id=None, fill_cache=True, s3_additional_kwargs=None, autocommit=True, cache_type='bytes', requester_pays=False)[source]¶ Open S3 key as a file. Data is only loaded and cached on demand.
Parameters: - s3 : S3FileSystem
botocore connection
- path : string
S3 bucket/key to access
- mode : str
One of ‘rb’, ‘wb’, ‘ab’. These have the same meaning as they do for the built-in open function.
- block_size : int
read-ahead size for finding delimiters
- fill_cache : bool
If seeking to new a part of the file beyond the current buffer, with this True, the buffer will be filled between the sections to best support random access. When reading only a few specific chunks out of a file, performance may be better if False.
- acl: str
Canned ACL to apply
- version_id : str
Optional version to read the file at. If not specified this will default to the current version of the object. This is only used for reading.
- requester_pays : bool (False)
If RequesterPays buckets are supported.
See also
S3FileSystem.open
- used to create
S3File
objects
Examples
>>> s3 = S3FileSystem() # doctest: +SKIP >>> with s3.open('my-bucket/my-file.txt', mode='rb') as f: # doctest: +SKIP ... ... # doctest: +SKIP
Attributes: - closed
Methods
close
()Close file commit
()Move from temp to final destination discard
()Throw away temporary file fileno
(/)Returns underlying file descriptor if one exists. flush
([force])Write buffered data to backend store. getxattr
(xattr_name, **kwargs)Get an attribute from the metadata. info
()File information about this path isatty
(/)Return whether this is an ‘interactive’ stream. metadata
([refresh])Return metadata of file. read
([length])Return data from cache, or fetch pieces as necessary readable
()Whether opened for reading readinto
(b)mirrors builtin file’s readinto method readline
()Read until first occurrence of newline character readlines
()Return all data, split by the newline character readuntil
([char, blocks])Return data between current position and first occurrence of char seek
(loc[, whence])Set current file location seekable
()Whether is seekable (only in read mode) setxattr
([copy_kwargs])Set metadata. tell
()Current file location truncate
Truncate file to size bytes. url
(**kwargs)HTTP URL to read this file (if it already exists) writable
()Whether opened for writing write
(data)Write data to buffer. writelines
(lines, /)Write a list of lines to stream. readinto1 -
getxattr
(xattr_name, **kwargs)[source]¶ Get an attribute from the metadata. See
getxattr()
.Examples
>>> mys3file.getxattr('attribute_1') # doctest: +SKIP 'value_1'
-
metadata
(refresh=False, **kwargs)[source]¶ Return metadata of file. See
metadata()
.Metadata is cached unless refresh=True.
-
s3fs.mapping.
S3Map
(root, s3, check=False, create=False)[source]¶ Mirror previous class, not implemented in fsspec