fsspec_utils.core.base
API Documentation¶
This module provides core filesystem functionalities and utilities, including custom cache mappers, enhanced cached filesystems, and a GitLab filesystem implementation.
FileNameCacheMapper
¶
Maps remote file paths to local cache paths while preserving directory structure.
This cache mapper maintains the original file path structure in the cache directory, creating necessary subdirectories as needed.
Attributes:
directory
(str
): Base directory for cached files
Example:
__init__()
¶
Initialize cache mapper with base directory.
Parameter | Type | Description |
---|---|---|
directory |
str |
Base directory where cached files will be stored |
__call__()
¶
Map remote file path to cache file path.
Creates necessary subdirectories in the cache directory to maintain the original path structure.
Parameter | Type | Description |
---|---|---|
path |
str |
Original file path from remote filesystem |
Returns | Type | Description |
---|---|---|
str |
str |
Cache file path that preserves original structure |
Example:
MonitoredSimpleCacheFileSystem
¶
Enhanced caching filesystem with monitoring and improved path handling.
This filesystem extends SimpleCacheFileSystem
to provide:
- Verbose logging of cache operations
- Improved path mapping for cache files
- Enhanced synchronization capabilities
- Better handling of parallel operations
Attributes:
_verbose
(bool
): Whether to print verbose cache operations_mapper
(FileNameCacheMapper
): Maps remote paths to cache pathsstorage
(list[str]
): List of cache storage locationsfs
(AbstractFileSystem
): Underlying filesystem being cached
Example:
__init__()
¶
Initialize monitored cache filesystem.
Parameter | Type | Description |
---|---|---|
fs |
Optional[fsspec.AbstractFileSystem] |
Underlying filesystem to cache. If None, creates a local filesystem. |
cache_storage |
Union[str, list[str]] |
Cache storage location(s). Can be string path or list of paths. |
verbose |
bool |
Whether to enable verbose logging of cache operations. |
**kwargs |
Any |
Additional arguments passed to SimpleCacheFileSystem . |
Example:
_check_cache()
¶
Check if file exists in cache and return cache path if found.
Parameter | Type | Description |
---|---|---|
path |
str |
Remote file path to check |
Returns | Type | Descript |
---|---|---|
Optional[str] |
str or None |
Cache file path if found, None otherwise |
Example:
_check_file()
¶
Ensure file is in cache, downloading if necessary.
Parameter | Type | Description |
---|---|---|
path |
str |
Remote file path |
Returns | Type | Description |
---|---|---|
str |
str |
Local cache path for the file |
Example:
GitLabFileSystem
¶
Filesystem interface for GitLab repositories.
Provides read-only access to files in GitLab repositories, including:
- Public and private repositories
- Self-hosted GitLab instances
- Branch/tag/commit selection
- Token-based authentication
Attributes:
protocol
(str
): Always "gitlab"base_url
(str
): GitLab instance URLproject_id
(str
): Project IDproject_name
(str
): Project name/pathref
(str
): Git reference (branch, tag, commit)token
(str
): Access tokenapi_version
(str
): API version
Example:
__init__()
¶
Initialize GitLab filesystem.
Parameter | Type | Description |
---|---|---|
base_url |
str |
GitLab instance URL |
project_id |
Optional[Union[str, int]] |
Project ID number |
project_name |
Optional[str] |
Project name/path (alternative to project_id) |
ref |
str |
Git reference (branch, tag, or commit SHA) |
token |
Optional[str] |
GitLab personal access token |
api_version |
str |
API version to use |
| **kwargs
| Any
| Additional filesystem arguments |
Raises | Type | Description |
---|---|---|
ValueError |
ValueError |
If neither project_id nor project_name is provided |
Example:
_get_file_content()
¶
Get file content from GitLab API.
Parameter | Type | Description |
---|---|---|
path |
str |
File path in repository |
Returns | Type | Description |
---|---|---|
bytes |
bytes |
File content as bytes |
Example:
Raises | Type | Description |
---|---|---|
FileNotFoundError |
FileNotFoundError |
If file doesn't exist |
requests.HTTPError |
requests.HTTPError |
For other HTTP errors |
_open()
¶
Open file for reading.
Parameter | Type | Description |
---|---|---|
path |
str |
File path to open |
mode |
str |
File mode (only 'rb' and 'r' supported) |
block_size |
Optional[int] |
Block size for reading (unused) |
cache_options |
Optional[dict] |
Cache options (unused) |
**kwargs |
Any |
Additional options |
Returns | Type | Description |
---|---|---|
File-like object | File-like object | File-like object for reading |
Raises | Type | Description |
---|---|---|
ValueError |
ValueError |
If mode is not supported |
cat()
¶
Get file contents as bytes.
Parameter | Type | Description |
---|---|---|
path |
str |
File path |
**kwargs |
Any |
Additional options |
Returns | Type | Description |
---|---|---|
bytes |
bytes |
File content as bytes |
ls()
¶
List directory contents.
Parameter | Type | Description |
---|---|---|
path |
str |
Directory path to list |
detail |
bool |
Whether to return detailed information |
**kwargs |
Any |
Additional options |
Returns | Type | Description |
---|---|---|
list |
list |
List of files/directories or their details |
exists()
¶
Check if file or directory exists.
Parameter | Type | Description |
---|---|---|
path |
str |
Path to check |
**kwargs |
Any |
Additional options |
Returns | Type | Description |
---|---|---|
bool |
bool |
True if path exists, False otherwise |
info()
¶
Get file information.
Parameter | Type | Description |
---|---|---|
path |
str |
File path |
**kwargs |
Any |
Additional options |
Returns | Type | Description |
---|---|---|
dict |
dict |
Dictionary with file information |
Raises | Type | Description |
---|---|---|
FileNotFoundError |
FileNotFoundError |
If file not found |
filesystem()
¶
Get filesystem instance with enhanced configuration options.
Creates filesystem instances with support for storage options classes, intelligent caching, and protocol inference from paths.
Parameter | Type | Description |
---|---|---|
protocol_or_path |
str |
Filesystem protocol (e.g., "s3", "file") or path with protocol prefix |
storage_options |
Optional[Union[BaseStorageOptions, dict]] |
Storage configuration as BaseStorageOptions instance or dict |
cached |
bool |
Whether to wrap filesystem in caching layer |
cache_storage |
Optional[str] |
Cache directory path (if cached=True ) |
verbose |
bool |
Enable verbose logging for cache operations |
dirfs |
bool |
Whether to wrap the filesystem in a DirFileSystem . Defaults to True . |
base_fs |
AbstractFileSystem |
An existing filesystem to wrap. |
**kwargs |
Any |
Additional filesystem arguments |
| Ret
| :------ | :--- | :---------- |
| AbstractFileSystem
| fsspec.AbstractFileSystem
| Configured filesystem instance |
Example:
get_filesystem()
¶
Get filesystem instance with enhanced configuration options.
Deprecated
Use filesystem
instead. This function will be removed in a future version.
Creates filesystem instances with support for storage options classes, intelligent caching, and protocol inference from paths.
Parameter | Type | Description |
---|---|---|
protocol_or_path |
str |
Filesystem protocol (e.g., "s3", "file") or path with protocol prefix |
storage_options |
Optional[Union[BaseStorageOptions, dict]] |
Storage configuration as BaseStorageOptions instance or dict |
cached |
bool |
Whether to wrap filesystem in caching layer |
cache_storage |
Optional[str] |
Cache directory path (if cached=True ) |
verbose |
bool |
Enable verbose logging for cache operations |
**kwargs |
Any |
Additional filesystem arguments |
Returns | Type | Description |
---|---|---|
fsspec.AbstractFileSystem |
fsspec.AbstractFileSystem |
Configured filesystem instance |
Example: