mlsnippet.datafs¶
-
class
mlsnippet.datafs.TarArchiveFS(archive_file, strict=False)¶ Bases:
mlsnippet.datafs.archivefs._ArchiveFSTar archive file based
DataFS.-
__init__(archive_file, strict=False)¶ Construct a new
TarArchiveFS.Parameters:
-
_close()¶ Override this method to destroy the internal states.
-
_init()¶ Override this method to initialize the internal states.
-
isfile(filename)¶ Check whether or not a file exists.
Parameters: filename (str) – The name of the file. Returns: Return type: bool
-
iter_files(meta_keys=None)¶ Iterate through all the files in this
DataFS.Parameters: meta_keys (None or Iterable[str]) – The keys of the meta data to be retrieved. (default
None)Yields: (filename, content, [meta-data…]) –
- A tuple containing the
name of a file, its content, and the values of each meta data corresponding to
meta_keys. If a requested key is absent for a file,Nonewill take the place.
Raises: UnsupportedOperation– Ifmeta_keysis specified, butREAD_METAcapacity is absent.
-
iter_names()¶ Iterate through all the file names in this
DataFS.Yields: str – The file name of each file.
-
open(filename, mode)¶ Open a file-like object to read / write a file.
Parameters: - filename (str) – The name of the file.
- mode ({'r', 'w'}) – The open mode of the file, either ‘r’ for reading or ‘w’ for writing. Other modes are not supported in general.
Returns: - The file-like object. This object will be immediately
closed as soon as this
DataFSinstance is closed.
Return type: file-like
Raises: InvalidOpenMode– If the specified mode is not supported, e.g.,mode == 'w'butWRITE_DATAcapacity is absent.DataFileNotExist– Ifmode == 'r'but filename does not exist.
-
-
class
mlsnippet.datafs.ZipArchiveFS(archive_file, strict=False)¶ Bases:
mlsnippet.datafs.archivefs._ArchiveFSZip archive file based
DataFS.-
__init__(archive_file, strict=False)¶ Construct a new
ZipArchiveFS.Parameters:
-
_close()¶ Override this method to destroy the internal states.
-
_init()¶ Override this method to initialize the internal states.
-
isfile(filename)¶ Check whether or not a file exists.
Parameters: filename (str) – The name of the file. Returns: Return type: bool
-
iter_files(meta_keys=None)¶ Iterate through all the files in this
DataFS.Parameters: meta_keys (None or Iterable[str]) – The keys of the meta data to be retrieved. (default
None)Yields: (filename, content, [meta-data…]) –
- A tuple containing the
name of a file, its content, and the values of each meta data corresponding to
meta_keys. If a requested key is absent for a file,Nonewill take the place.
Raises: UnsupportedOperation– Ifmeta_keysis specified, butREAD_METAcapacity is absent.
-
iter_names()¶ Iterate through all the file names in this
DataFS.Yields: str – The file name of each file.
-
open(filename, mode)¶ Open a file-like object to read / write a file.
Parameters: - filename (str) – The name of the file.
- mode ({'r', 'w'}) – The open mode of the file, either ‘r’ for reading or ‘w’ for writing. Other modes are not supported in general.
Returns: - The file-like object. This object will be immediately
closed as soon as this
DataFSinstance is closed.
Return type: file-like
Raises: InvalidOpenMode– If the specified mode is not supported, e.g.,mode == 'w'butWRITE_DATAcapacity is absent.DataFileNotExist– Ifmode == 'r'but filename does not exist.
-
-
class
mlsnippet.datafs.DataFSCapacity(mode=0)¶ Bases:
objectEnumeration class to represent the capacity of a
DataFS.There are 7 different categories of capacities. Every method of
DataFSmay only work if theDataFShas the particular one or more capacities. One may check whether theDataFShas a certain capacity bycan_[capacity_name]().-
ALL= 127¶ All capacities are supported.
-
LIST_META= 16¶ Can enumerate the meta keys for a particular file.
-
QUICK_COUNT= 32¶ Can get the count of files without iterating through them.
-
RANDOM_SAMPLE= 64¶ Can randomly sample files without obtaining the whole file list.
-
READ_META= 4¶ Can read meta data.
-
READ_WRITE_DATA= 3¶ Can read and write file data.
-
READ_WRITE_META= 12¶ Can read and write meta data.
-
WRITE_DATA= 2¶ Can write file data.
-
WRITE_META= 8¶ Can write meta data.
-
__init__(mode=0)¶ Construct a new
DataFSCapacity.Parameters: mode (int) – The mode number of this capacity flag.
-
can_list_meta()¶
-
can_quick_count()¶
-
can_random_sample()¶
-
can_read_data()¶
-
can_read_meta()¶
-
can_write_data()¶
-
can_write_meta()¶
-
-
class
mlsnippet.datafs.DataFS(capacity, strict=False)¶ Bases:
mlsnippet.utils.concepts.AutoInitAndCloseableBase class for all data file systems.
A
DataFSprovides access to a machine learning dataset stored in a file system like backend. For example, large image datasets are usually stored as raw image files, gathered in a directory. Such true file system can be accessed byLocalFS.Apart from the true file system, some may instead store these images in a database provided virtual file system, for example, the GridFS of MongoDB, which can be accessed via
MongoFS.-
__init__(capacity, strict=False)¶ Initialize the base
DataFSclass.Parameters: - capacity (int or DataFSCapacity) – Specify the capacity of the
derived
DataFS. - strict (bool) –
Whether or not this
DataFSworks in strict mode? (defaultFalse)In strict mode, the following behaviours will take place:
- Accessing the value of a non-exist meta key will cause
a
MetaKeyNotExist, instead of gettingNone.
- Accessing the value of a non-exist meta key will cause
a
- capacity (int or DataFSCapacity) – Specify the capacity of the
derived
-
as_flow(batch_size, with_names=True, meta_keys=None, shuffle=False, skip_incomplete=False, names_pattern=None)¶ Construct a
DataFlow, which iterates through the files once and only once in an epoch.The returned
DataFSFlowwill hold a copy of this instance (obtained byclone()) instead of holding this instance itself.Parameters: - batch_size (int) – Size of each mini-batch.
- with_names (bool) – Whether or not to include the file names
in the returned flow? (default
True) - meta_keys (None or Iterable[str]) – The keys of the meta data
to be included in the returned flow. (default
None) - shuffle (bool) – Whether or not to shuffle the files in each
epoch of the flow? Setting this to
Truewill force loading the file list into memory. (defaultFalse) - skip_incomplete (bool) – Whether or not to exclude a mini-batch,
if it has fewer data than
batch_size? (defaultFalse, the final mini-batch will always be visited even if it has fewer data thanbatch_size) - names_pattern (None or str or regex) – The file name pattern.
If specified, only if the file name matches this pattern,
would the file be included in the constructed data flow.
Specifying this option will force loading the file list
into memory. (default
None)
Returns: - A dataflow, with each mini-batch
having numpy arrays
([filename,] content, [meta-data...]), according to the arguments.
Return type:
-
batch_get_meta(filenames, meta_keys)¶ Get meta data of files.
Parameters: Returns: - A list of meta values, or
None if the corresponding file does not exist.
Return type: - A list of meta values, or
-
batch_isfile(filenames)¶ Check whether or not the files exist.
Parameters: filenames (Iterable[str]) – The names of the files. Returns: Return type: list[bool]
-
capacity¶ Get the capacity of this
DataFS.Returns: The capacity object. Return type: DataFSCapacity
-
clear_and_put_meta(filename, meta_dict=None, **meta_dict_kwargs)¶ Set the meta data of a file. The un-mentioned meta data will be cleared. This method is not necessarily slower than
put_meta().Parameters: Raises: DataFileNotExist– If filename does not exist.UnsupportedOperation– If theWRITE_METAcapacity (and possibly theLIST_METAcapacity) is(are) absent.
-
clear_meta(filename)¶ Clear all the meta data of a file.
Parameters: filename (str) – The name of the file.
Raises: DataFileNotExist– If filename does not exist.UnsupportedOperation– If theWRITE_METAcapacity (and possibly theLIST_METAcapacity) is(are) absent.
-
clone()¶ Obtain a clone of this
DataFSinstance.Returns: - The cloned
DataFS. Only the construction - arguments will be copied. All the internal states (e.g., database connections) are kept un-initialized.
Return type: DataFS - The cloned
-
count()¶ Count the files in this
DataFS.Will iterate through all the files via
iter_names(), ifQUICK_COUNTcapacity is absent.Returns: The total number of files. Return type: int
-
get_data(filename)¶ Get the content of a file.
Parameters: filename (str) – The name of the file. Returns: The content of a file. DataFileNotExist: If filename does not exist. Return type: bytes
-
get_meta(filename, meta_keys)¶ Get meta data of a file.
Parameters: Returns: - The meta values, corresponding to
meta_keys. If a requested key is absent for a file,
Nonewill take the place.
Return type: tuple[any]
Raises: DataFileNotExist– If filename does not exist.UnsupportedOperation– If theREAD_METAcapacity is absent.
- The meta values, corresponding to
-
get_meta_dict(filename)¶ Get all the meta data of a file, as a dict.
Parameters: filename (str) – The name of the file.
Returns: The meta values, as a dict.
Return type: Raises: DataFileNotExist– If filename does not exist.UnsupportedOperation– If theREAD_METAorLIST_METAcapacity is absent.
-
isfile(filename)¶ Check whether or not a file exists.
Parameters: filename (str) – The name of the file. Returns: Return type: bool
-
iter_files(meta_keys=None)¶ Iterate through all the files in this
DataFS.Parameters: meta_keys (None or Iterable[str]) – The keys of the meta data to be retrieved. (default
None)Yields: (filename, content, [meta-data…]) –
- A tuple containing the
name of a file, its content, and the values of each meta data corresponding to
meta_keys. If a requested key is absent for a file,Nonewill take the place.
Raises: UnsupportedOperation– Ifmeta_keysis specified, butREAD_METAcapacity is absent.
-
iter_names()¶ Iterate through all the file names in this
DataFS.Yields: str – The file name of each file.
-
list_meta(filename)¶ List the meta keys of a file.
Parameters: filename (str) – The name of the file.
Returns: The keys of the meta data of the file.
Return type: Raises: DataFileNotExist– If filename does not exist.UnsupportedOperation– If theLIST_METAcapacity is absent.
-
list_names()¶ Get the list of all the file names.
Returns: The file names list. Return type: list[str]
-
open(filename, mode)¶ Open a file-like object to read / write a file.
Parameters: - filename (str) – The name of the file.
- mode ({'r', 'w'}) – The open mode of the file, either ‘r’ for reading or ‘w’ for writing. Other modes are not supported in general.
Returns: - The file-like object. This object will be immediately
closed as soon as this
DataFSinstance is closed.
Return type: file-like
Raises: InvalidOpenMode– If the specified mode is not supported, e.g.,mode == 'w'butWRITE_DATAcapacity is absent.DataFileNotExist– Ifmode == 'r'but filename does not exist.
-
put_data(filename, data)¶ Save the content of a file.
Parameters: Raises: UnsupportedOperation– IfWRITE_DATAcapacity is absent.
-
put_meta(filename, meta_dict=None, **meta_dict_kwargs)¶ Update the meta data of a file. The un-mentioned meta data will remain unchanged. This method is not necessarily faster than
clear_and_put_meta(). In some backends it may be implemented by first callingget_meta_dict, then updating the meta dict in memory, and finally callingclear_and_put_meta.Parameters: Raises: DataFileNotExist– If filename does not exist.UnsupportedOperation– If theWRITE_METAcapacity (and possibly theREAD_METAcapacity) is(are) absent.
-
random_flow(batch_size, with_names=True, meta_keys=None, skip_incomplete=False, batch_count=None)¶ Construct a
DataFlow, with infinite or pre-configured number of mini-batches in an epoch, randomly sampled from the wholeDataFS.The returned
DataFSRandomFlowwill hold a copy of this instance (obtained byclone()) instead of holding this instance itself.Parameters: - batch_size (int) – Size of each mini-batch.
- with_names (bool) – Whether or not to include the file names
in the returned flow? (default
True) - meta_keys (None or Iterable[str]) – The keys of the meta data
to be included in the returned flow. (default
None) - skip_incomplete (bool) – Whether or not to exclude a mini-batch,
if it has fewer data than
batch_size? (defaultFalse, the final mini-batch will always be visited even if it has fewer data thanbatch_size) - batch_count (int or None) – The number of mini-batches to obtain
in an epoch. (default
None, infinite mini-batches)
Returns: - A dataflow, with each mini-batch
having numpy arrays
([filename,] content, [meta-data...]), according to the arguments.
Return type: Raises: UnsupportedOperation– IfRANDOM_SAMPLEcapacity is absent.
-
retrieve(filename, meta_keys=None)¶ Retrieve the content and maybe meta data of a file.
Parameters: Returns: - The content, or a tuple
containing the content and the meta values, corresponding to
meta_keys. If a requested key is absent for a file,Nonewill take the place.
Return type: Notes
As long as
meta_keysis not None, a tuple will always be returned, even ifmeta_keysis an empty collection.Raises: UnsupportedOperation– Ifmeta_keysis specified, butREAD_METAcapacity is absent.DataFileNotExist– If filename does not exist.
-
sample_files(n_samples, meta_keys=None)¶ Sample
n_samplesfiles from thisDataFS.Parameters: Returns: - A list of tuples,
each tuple contains the name of a file, its content, and the values of each meta data corresponding to
meta_keys. If a requested key is absent for a file,Nonewill take the place.
Return type: list[(filename, content, [meta-data…])]
Raises: UnsupportedOperation– IfRANDOM_SAMPLEcapacity is absent, ormeta_keysis specified, butREAD_METAcapacity is absent.
-
sample_names(n_samples)¶ Sample
n_samplesfile names from thisDataFS.Parameters: n_samples (int) – Number of names to sample. The returned names may be fewer than this number, if there are less than n_samplesfiles in thisDataFS.Returns: The list of sampled file names. Return type: list[str] Raises: UnsupportedOperation– IfRANDOM_SAMPLEcapacity is absent.
-
sub_flow(batch_size, names, with_names=True, meta_keys=None, shuffle=False, skip_incomplete=False)¶ Construct a
DataFlow, which iterates through the files according to selected names.The returned
DataFSFlowwill hold a copy of this instance (obtained byclone()) instead of holding this instance itself.Parameters: - batch_size (int) – Size of each mini-batch.
- names (list[str] or np.ndarray[str]) – The names to retrieve.
- with_names (bool) – Whether or not to include the file names
in the returned flow? (default
True) - meta_keys (None or Iterable[str]) – The keys of the meta data
to be included in the returned flow. (default
None) - shuffle (bool) – Whether or not to shuffle the files in each
epoch of the flow? Setting this to
Truewill force loading the file list into memory. (defaultFalse) - skip_incomplete (bool) – Whether or not to exclude a mini-batch,
if it has fewer data than
batch_size? (defaultFalse, the final mini-batch will always be visited even if it has fewer data thanbatch_size)
Returns: - A dataflow, with each mini-batch
having numpy arrays
([filename,] content, [meta-data...]), according to the arguments.
Return type:
-
-
exception
mlsnippet.datafs.DataFSError¶ Bases:
exceptions.ExceptionBase class for all
DataFSerrors.
-
exception
mlsnippet.datafs.UnsupportedOperation¶ Bases:
mlsnippet.datafs.errors.DataFSErrorClass to indicate that a requested operation is not supported by the specific
DataFSsubclass.
-
exception
mlsnippet.datafs.InvalidOpenMode(mode)¶ Bases:
mlsnippet.datafs.errors.UnsupportedOperationClass to indicate that the specified open mode is not supported.
-
mode¶
-
-
exception
mlsnippet.datafs.DataFileNotExist(filename)¶ Bases:
mlsnippet.datafs.errors.DataFSErrorClass to indicate a requested data file does not exist.
-
filename¶
-
-
exception
mlsnippet.datafs.MetaKeyNotExist(filename, meta_key)¶ Bases:
mlsnippet.datafs.errors.DataFSErrorClass to indicate a requested meta key does not exist.
-
filename¶
-
meta_key¶
-
-
class
mlsnippet.datafs.LocalFS(root_dir, strict=False)¶ Bases:
mlsnippet.datafs.base.DataFSLocal directory based
DataFS.-
_close()¶ Override this method to destroy the internal states.
-
_init()¶ Override this method to initialize the internal states.
-
clear_meta(filename)¶ Clear all the meta data of a file.
Parameters: filename (str) – The name of the file.
Raises: DataFileNotExist– If filename does not exist.UnsupportedOperation– If theWRITE_METAcapacity (and possibly theLIST_METAcapacity) is(are) absent.
-
clone()¶ Obtain a clone of this
DataFSinstance.Returns: - The cloned
DataFS. Only the construction - arguments will be copied. All the internal states (e.g., database connections) are kept un-initialized.
Return type: DataFS - The cloned
-
get_meta(filename, meta_keys)¶ Get meta data of a file.
Parameters: Returns: - The meta values, corresponding to
meta_keys. If a requested key is absent for a file,
Nonewill take the place.
Return type: tuple[any]
Raises: DataFileNotExist– If filename does not exist.UnsupportedOperation– If theREAD_METAcapacity is absent.
- The meta values, corresponding to
-
isfile(filename)¶ Check whether or not a file exists.
Parameters: filename (str) – The name of the file. Returns: Return type: bool
-
iter_names()¶ Iterate through all the file names in this
DataFS.Yields: str – The file name of each file.
-
list_meta(filename)¶ List the meta keys of a file.
Parameters: filename (str) – The name of the file.
Returns: The keys of the meta data of the file.
Return type: Raises: DataFileNotExist– If filename does not exist.UnsupportedOperation– If theLIST_METAcapacity is absent.
-
open(filename, mode)¶ Open a file-like object to read / write a file.
Parameters: - filename (str) – The name of the file.
- mode ({'r', 'w'}) – The open mode of the file, either ‘r’ for reading or ‘w’ for writing. Other modes are not supported in general.
Returns: - The file-like object. This object will be immediately
closed as soon as this
DataFSinstance is closed.
Return type: file-like
Raises: InvalidOpenMode– If the specified mode is not supported, e.g.,mode == 'w'butWRITE_DATAcapacity is absent.DataFileNotExist– Ifmode == 'r'but filename does not exist.
-
put_meta(filename, meta_dict=None, **meta_dict_kwargs)¶ Update the meta data of a file. The un-mentioned meta data will remain unchanged. This method is not necessarily faster than
clear_and_put_meta(). In some backends it may be implemented by first callingget_meta_dict, then updating the meta dict in memory, and finally callingclear_and_put_meta.Parameters: Raises: DataFileNotExist– If filename does not exist.UnsupportedOperation– If theWRITE_METAcapacity (and possibly theREAD_METAcapacity) is(are) absent.
-
root_dir¶ Get the absolute path of the root directory.
-
sample_names(n_samples)¶ Sample
n_samplesfile names from thisDataFS.Parameters: n_samples (int) – Number of names to sample. The returned names may be fewer than this number, if there are less than n_samplesfiles in thisDataFS.Returns: The list of sampled file names. Return type: list[str] Raises: UnsupportedOperation– IfRANDOM_SAMPLEcapacity is absent.
-
-
class
mlsnippet.datafs.MongoFS(conn_str, db_name, coll_name, strict=False)¶ Bases:
mlsnippet.datafs.base.DataFS,mlsnippet.utils.mongo_binder.MongoBinderMongoDB GridFS based
DataFS.This class provides a
DataFS, which saves the files in a MongoDB GridFS, and stores the meta values inmetadatafield of each record in the fs collection.-
batch_get_meta(filenames, meta_keys)¶ Get meta data of files.
Parameters: Returns: - A list of meta values, or
None if the corresponding file does not exist.
Return type: - A list of meta values, or
-
batch_isfile(filenames)¶ Check whether or not the files exist.
Parameters: filenames (Iterable[str]) – The names of the files. Returns: Return type: list[bool]
-
clear_and_put_meta(filename, meta_dict=None, **meta_dict_kwargs)¶ Set the meta data of a file. The un-mentioned meta data will be cleared. This method is not necessarily slower than
put_meta().Parameters: Raises: DataFileNotExist– If filename does not exist.UnsupportedOperation– If theWRITE_METAcapacity (and possibly theLIST_METAcapacity) is(are) absent.
-
clear_meta(filename)¶ Clear all the meta data of a file.
Parameters: filename (str) – The name of the file.
Raises: DataFileNotExist– If filename does not exist.UnsupportedOperation– If theWRITE_METAcapacity (and possibly theLIST_METAcapacity) is(are) absent.
-
clone()¶ Obtain a clone of this
DataFSinstance.Returns: - The cloned
DataFS. Only the construction - arguments will be copied. All the internal states (e.g., database connections) are kept un-initialized.
Return type: DataFS - The cloned
-
count()¶ Count the files in this
DataFS.Will iterate through all the files via
iter_names(), ifQUICK_COUNTcapacity is absent.Returns: The total number of files. Return type: int
-
get_meta(filename, meta_keys)¶ Get meta data of a file.
Parameters: Returns: - The meta values, corresponding to
meta_keys. If a requested key is absent for a file,
Nonewill take the place.
Return type: tuple[any]
Raises: DataFileNotExist– If filename does not exist.UnsupportedOperation– If theREAD_METAcapacity is absent.
- The meta values, corresponding to
-
get_meta_dict(filename)¶ Get all the meta data of a file, as a dict.
Parameters: filename (str) – The name of the file.
Returns: The meta values, as a dict.
Return type: Raises: DataFileNotExist– If filename does not exist.UnsupportedOperation– If theREAD_METAorLIST_METAcapacity is absent.
-
isfile(filename)¶ Check whether or not a file exists.
Parameters: filename (str) – The name of the file. Returns: Return type: bool
-
iter_files(meta_keys=None)¶ Iterate through all the files in this
DataFS.Parameters: meta_keys (None or Iterable[str]) – The keys of the meta data to be retrieved. (default
None)Yields: (filename, content, [meta-data…]) –
- A tuple containing the
name of a file, its content, and the values of each meta data corresponding to
meta_keys. If a requested key is absent for a file,Nonewill take the place.
Raises: UnsupportedOperation– Ifmeta_keysis specified, butREAD_METAcapacity is absent.
-
iter_names()¶ Iterate through all the file names in this
DataFS.Yields: str – The file name of each file.
-
list_meta(filename)¶ List the meta keys of a file.
Parameters: filename (str) – The name of the file.
Returns: The keys of the meta data of the file.
Return type: Raises: DataFileNotExist– If filename does not exist.UnsupportedOperation– If theLIST_METAcapacity is absent.
-
open(filename, mode)¶ Open a file-like object to read / write a file.
Parameters: - filename (str) – The name of the file.
- mode ({'r', 'w'}) – The open mode of the file, either ‘r’ for reading or ‘w’ for writing. Other modes are not supported in general.
Returns: - The file-like object. This object will be immediately
closed as soon as this
DataFSinstance is closed.
Return type: file-like
Raises: InvalidOpenMode– If the specified mode is not supported, e.g.,mode == 'w'butWRITE_DATAcapacity is absent.DataFileNotExist– Ifmode == 'r'but filename does not exist.
-
put_data(filename, data)¶ Save the content of a file.
Parameters: Raises: UnsupportedOperation– IfWRITE_DATAcapacity is absent.
-
put_meta(filename, meta_dict=None, **meta_dict_kwargs)¶ Update the meta data of a file. The un-mentioned meta data will remain unchanged. This method is not necessarily faster than
clear_and_put_meta(). In some backends it may be implemented by first callingget_meta_dict, then updating the meta dict in memory, and finally callingclear_and_put_meta.Parameters: Raises: DataFileNotExist– If filename does not exist.UnsupportedOperation– If theWRITE_METAcapacity (and possibly theREAD_METAcapacity) is(are) absent.
-
retrieve(filename, meta_keys=None)¶ Retrieve the content and maybe meta data of a file.
Parameters: Returns: - The content, or a tuple
containing the content and the meta values, corresponding to
meta_keys. If a requested key is absent for a file,Nonewill take the place.
Return type: Notes
As long as
meta_keysis not None, a tuple will always be returned, even ifmeta_keysis an empty collection.Raises: UnsupportedOperation– Ifmeta_keysis specified, butREAD_METAcapacity is absent.DataFileNotExist– If filename does not exist.
-
sample_files(n_samples, meta_keys=None)¶ Sample
n_samplesfiles from thisDataFS.Parameters: Returns: - A list of tuples,
each tuple contains the name of a file, its content, and the values of each meta data corresponding to
meta_keys. If a requested key is absent for a file,Nonewill take the place.
Return type: list[(filename, content, [meta-data…])]
Raises: UnsupportedOperation– IfRANDOM_SAMPLEcapacity is absent, ormeta_keysis specified, butREAD_METAcapacity is absent.
-
sample_names(n_samples)¶ Sample
n_samplesfile names from thisDataFS.Parameters: n_samples (int) – Number of names to sample. The returned names may be fewer than this number, if there are less than n_samplesfiles in thisDataFS.Returns: The list of sampled file names. Return type: list[str] Raises: UnsupportedOperation– IfRANDOM_SAMPLEcapacity is absent.
-
-
class
mlsnippet.datafs.DataFSForwardFlow(fs, batch_size, with_names=True, meta_keys=None, skip_incomplete=False)¶ Bases:
mlsnippet.datafs.dataflow._BaseDataFSFlowA
DataFSderivedDataFlow, iterating through mini-batches in a forward-only fashion (data are obtained byiter_files()).-
__init__(fs, batch_size, with_names=True, meta_keys=None, skip_incomplete=False)¶ Construct a new
DataFSForwardFlow.Parameters: - fs (DataFS) – The data fs instance, where to read data.
- batch_size (int) – Size of each mini-batch.
- with_names (bool) – Whether or not to include the file names
in mini-batches? (default
True) - meta_keys (None or Iterable[str]) – The keys of the meta data
to be included in mini-batches. (default
None) - skip_incomplete (bool) – Whether or not to exclude a mini-batch,
if it has fewer data than
batch_size? (defaultFalse, the final mini-batch will always be visited even if it has fewer data thanbatch_size)
-
-
class
mlsnippet.datafs.DataFSIndexedFlow(fs, batch_size, names, with_names=True, meta_keys=None, shuffle=False, skip_incomplete=False, random_state=None)¶ Bases:
mlsnippet.datafs.dataflow._BaseDataFSFlowA
DataFSderivedDataFlow, iterating through mini-batches according to given names (data are obtaining byretrieve()).-
__init__(fs, batch_size, names, with_names=True, meta_keys=None, shuffle=False, skip_incomplete=False, random_state=None)¶ Construct a new
DataFSIndexedFlow.Parameters: - fs (DataFS) – The data fs instance, where to read data.
- batch_size (int) – Size of each mini-batch.
- names (list[str] or np.ndarray[str]) – The names to retrieve.
- with_names (bool) – Whether or not to include the file names
in mini-batches? (default
True) - meta_keys (None or Iterable[str]) – The keys of the meta data
to be included in mini-batches. (default
None) - shuffle (bool) – Whether or not to shuffle the name indices
before each epoch? (default
False) - skip_incomplete (bool) – Whether or not to exclude a mini-batch,
if it has fewer data than
batch_size? (defaultFalse, the final mini-batch will always be visited even if it has fewer data thanbatch_size) - random_state (RandomState) – Optional numpy RandomState for
shuffling data before each epoch. (default
None, use the globalRandomState).
-
is_shuffled¶ Whether or not to shuffle the names before each epoch?
-
-
class
mlsnippet.datafs.DataFSRandomFlow(fs, batch_size, with_names=True, meta_keys=None, batch_count=None, skip_incomplete=False)¶ Bases:
mlsnippet.datafs.dataflow._BaseDataFSFlowA
DataFSderivedDataFlow, obtaining random samples from theDataFS.-
__init__(fs, batch_size, with_names=True, meta_keys=None, batch_count=None, skip_incomplete=False)¶ Construct a new
DataFSRandomFlow.Parameters: - fs (DataFS) – The data fs instance, where to read data.
- batch_size (int) – Size of each mini-batch.
- with_names (bool) – Whether or not to include the file names
in mini-batches? (default
True) - meta_keys (None or Iterable[str]) – The keys of the meta data
to be included in mini-batches. (default
None) - batch_count (int or None) – The number of mini-batches to obtain
in an epoch. (default
None, infinite mini-batches) - skip_incomplete (bool) – Whether or not to exclude a mini-batch,
if it has fewer data than
batch_size? (defaultFalse, the final mini-batch will always be visited even if it has fewer data thanbatch_size)
-
batch_count¶ Get the number of mini-batches to obtain in an epoch.
-