HappyBase Package#
Google Cloud Bigtable HappyBase package.
This package is intended to emulate the HappyBase library using Google Cloud Bigtable as the backing store.
Differences in Public API#
Some concepts from HBase/Thrift do not map directly to the Cloud Bigtable API. As a result
Table.regions()
could not be implemented since tables in Cloud Bigtable do not expose internal storage detailsConnection.enable_table()
does nothing since Cloud Bigtable has no concept of enabled/disabledConnection.disable_table()
does nothing since Cloud Bigtable has no concept of enabled/disabledConnection.is_table_enabled()
always returnsTrue
since Cloud Bigtable has no concept of enabled/disabledConnection.compact_table()
does nothing since Cloud Bigtable handles table compactions automatically and does not expose an API for it- The
__version__
value for the HappyBase package isNone
. However, it’s worth nothing this implementation was based off HappyBase 0.9.
In addition, many of the constants from
connection
are specific to HBase and are defined as None
in our module:
COMPAT_MODES
THRIFT_TRANSPORTS
THRIFT_PROTOCOLS
DEFAULT_HOST
DEFAULT_PORT
DEFAULT_TRANSPORT
DEFAULT_COMPAT
DEFAULT_PROTOCOL
Two of these DEFAULT_HOST
and DEFAULT_PORT
, are even imported in
the main happybase
package.
Finally, we do not provide the util
module. Though it is public in the
HappyBase library, it provides no core functionality.
API Behavior Changes#
Since there is no concept of an enabled / disabled table, calling
Connection.delete_table()
withdisable=True
can’t be supported. Using that argument will result in a warning.The
Connection
constructor disables the use of several arguments and will print a warning if any of them are passed in as keyword arguments. The arguments are:host
port
compat
transport
protocol
In order to make
Connection
compatible with Cloud Bigtable, we add ainstance
keyword argument to allow users to pass in their ownInstance
(which they can construct beforehand).For example:
from google.cloud.bigtable.client import Client client = Client(project=PROJECT_ID, admin=True) instance = client.instance(instance_id, location_id) instance.reload() from google.cloud.happybase import Connection connection = Connection(instance=instance)
Any uses of the
wal
(Write Ahead Log) argument will result in a warning as well. This includes uses in:When calling
Connection.create_table()
, the majority of HBase column family options cannot be used. Amongmax_versions
compression
in_memory
bloom_filter_type
bloom_filter_vector_size
bloom_filter_nb_hashes
block_cache_enabled
time_to_live
Only
max_versions
andtime_to_live
are availabe in Cloud Bigtable (asMaxVersionsGCRule
andMaxAgeGCRule
).In addition to using a dictionary for specifying column family options, we also accept instances of
GarbageCollectionRule
or subclasses.Table.scan()
no longer accepts the following arguments (which will result in a warning):batch_size
scan_batching
sorted_columns
Using a HBase filter string in
Table.scan()
is not possible with Cloud Bigtable and will result in aTypeError
. However, the method now accepts instances ofRowFilter
and subclasses.Batch.delete()
(and henceTable.delete()
) will fail with aValueError
when either a row or column family delete is attempted with atimestamp
. This is because the Cloud Bigtable API uses theDeleteFromFamily
andDeleteFromRow
mutations for these deletes, and neither of these mutations support a timestamp.
HappyBase Connection#
Google Cloud Bigtable HappyBase connection module.
-
class
google.cloud.happybase.connection.
Connection
(timeout=None, autoconnect=True, table_prefix=None, table_prefix_separator='_', instance=None, **kwargs)[source]# Bases:
object
Connection to Cloud Bigtable backend.
Note
If you pass a
instance
, it will beInstance.copy()
-ed before being stored on the new connection. This also copies theClient
that created theInstance
instance and theCredentials
stored on the client.The arguments
host
,port
,compat
,transport
andprotocol
are allowed (as keyword arguments) for compatibility with HappyBase. However, they will not be used in any way, and will cause a warning if passed.Parameters: - timeout (int) – (Optional) The socket timeout in milliseconds.
- autoconnect (bool) – (Optional) Whether the connection should be
open()
-ed during construction. - table_prefix (str) – (Optional) Prefix used to construct table names.
- table_prefix_separator (str) – (Optional) Separator used with
table_prefix
. Defaults to_
. - instance (
Instance
) – (Optional) A Cloud Bigtable instance. The instance also owns a client for making gRPC requests to the Cloud Bigtable API. If not passed in, defaults to creating client withadmin=True
and using thetimeout
here for thetimeout_seconds
argument to theClient
constructor. The credentials for the client will be the implicit ones loaded from the environment. Then that client is used to retrieve all the instances owned by the client’s project. - kwargs (dict) – Remaining keyword arguments. Provided for HappyBase compatibility.
-
close
()[source]# Close the underlying transport to Cloud Bigtable.
This method does nothing and is provided for compatibility.
-
static
compact_table
(name, major=False)[source]# Compact the specified table.
Warning
Cloud Bigtable supports table compactions, it just doesn’t expose an API for that feature, so this method does nothing. It is provided simply for compatibility.
Parameters:
-
create_table
(name, families)[source]# Create a table.
Warning
The only column family options from HappyBase that are able to be used with Cloud Bigtable are
max_versions
andtime_to_live
.Values in
families
represent column family options. In HappyBase, these are dictionaries, corresponding to theColumnDescriptor
structure in the Thrift API. The accepted keys are:max_versions
(int
)compression
(str
)in_memory
(bool
)bloom_filter_type
(str
)bloom_filter_vector_size
(int
)bloom_filter_nb_hashes
(int
)block_cache_enabled
(bool
)time_to_live
(int
)
Parameters: Raises: - TypeError – If
families
is not a dictionary. - ValueError – If
families
has no entries. - AlreadyExists – If creation fails due to an already existing table.
- NetworkError – If creation fails for a reason other than table exists.
-
delete_table
(name, disable=False)[source]# Delete the specified table.
Parameters: - name (str) – The name of the table to be deleted. If
table_prefix
is set, a prefix will be added to thename
. - disable (bool) – Whether to first disable the table if needed. This is provided for compatibility with HappyBase, but is not relevant for Cloud Bigtable since it has no concept of enabled / disabled tables.
- name (str) – The name of the table to be deleted. If
-
static
disable_table
(name)[source]# Disable the specified table.
Warning
Cloud Bigtable has no concept of enabled / disabled tables so this method does nothing. It is provided simply for compatibility.
Parameters: name (str) – The name of the table to be disabled.
-
static
enable_table
(name)[source]# Enable the specified table.
Warning
Cloud Bigtable has no concept of enabled / disabled tables so this method does nothing. It is provided simply for compatibility.
Parameters: name (str) – The name of the table to be enabled.
-
static
is_table_enabled
(name)[source]# Return whether the specified table is enabled.
Warning
Cloud Bigtable has no concept of enabled / disabled tables so this method always returns
True
. It is provided simply for compatibility.Parameters: name (str) – The name of the table to check enabled / disabled status. Return type: bool Returns: The value True
always.
-
open
()[source]# Open the underlying transport to Cloud Bigtable.
This method does nothing and is provided for compatibility.
-
table
(name, use_prefix=True)[source]# Table factory.
Parameters: Return type: Returns: Table instance owned by this connection.
-
tables
()[source]# Return a list of table names available to this connection.
Note
This lists every table in the instance owned by this connection, not every table that a given user may have access to.
Note
If
table_prefix
is set on this connection, only returns the table names which match that prefix.Return type: list Returns: List of string table names.
HappyBase Connection Pool#
Google Cloud Bigtable HappyBase pool module.
-
class
google.cloud.happybase.pool.
ConnectionPool
(size, **kwargs)[source]# Bases:
object
Thread-safe connection pool.
Note
All keyword arguments are passed unmodified to the
Connection
constructor except forautoconnect
. This is because theopen
/closed
status of a connection is managed by the pool. In addition, ifinstance
is not passed, the default / inferred instance is determined by the pool and then passed to eachConnection
that is created.Parameters: Raises: TypeError
ifsize
is non an integer.ValueError
ifsize
is not positive.-
connection
(*args, **kwds)[source]# Obtain a connection from the pool.
Must be used as a context manager, for example:
with pool.connection() as connection: pass # do something with the connection
If
timeout
is omitted, this method waits forever for a connection to become available from the local queue.Yields an active
Connection
from the pool.Parameters: timeout (int) – (Optional) Time (in seconds) to wait for a connection to open. Raises: NoConnectionsAvailable
if no connection can be retrieved from the pool before thetimeout
(only if a timeout is specified).
-
-
exception
google.cloud.happybase.pool.
NoConnectionsAvailable
[source]# Bases:
exceptions.RuntimeError
Exception raised when no connections are available.
This happens if a timeout was specified when obtaining a connection, and no connection became available within the specified timeout.
HappyBase Table#
Google Cloud Bigtable HappyBase table module.
-
class
google.cloud.happybase.table.
Table
(name, connection)[source]# Bases:
object
Representation of Cloud Bigtable table.
Used for adding data and
Parameters: - name (str) – The name of the table.
- connection (
Connection
) – The connection which has access to the table.
-
batch
(timestamp=None, batch_size=None, transaction=False, wal=<object object>)[source]# Create a new batch operation for this table.
This method returns a new
Batch
instance that can be used for mass data manipulation.Parameters: - timestamp (int) – (Optional) Timestamp (in milliseconds since the epoch) that all mutations will be applied at.
- batch_size (int) – (Optional) The maximum number of mutations to allow to accumulate before committing them.
- transaction (bool) – Flag indicating if the mutations should be sent
transactionally or not. If
transaction=True
and an error occurs while aBatch
is active, then none of the accumulated mutations will be committed. Ifbatch_size
is set, the mutation can’t be transactional. - wal (object) – Unused parameter (to be passed to the created batch). Provided for compatibility with HappyBase, but irrelevant for Cloud Bigtable since it does not have a Write Ahead Log.
Return type: Batch
Returns: A batch bound to this table.
-
cells
(row, column, versions=None, timestamp=None, include_timestamp=False)[source]# Retrieve multiple versions of a single cell from the table.
Parameters: - row (str) – Row key for the row we are reading from.
- column (str) – Column we are reading from; of the form
fam:col
. - versions (int) – (Optional) The maximum number of cells to return. If not set, returns all cells found.
- timestamp (int) – (Optional) Timestamp (in milliseconds since the epoch). If specified, only cells returned before (or at) the timestamp will be returned.
- include_timestamp (bool) – Flag to indicate if cell timestamps should be included with the output.
Return type: Returns: List of values in the cell (with timestamps if
include_timestamp
isTrue
).
-
counter_dec
(row, column, value=1)[source]# Atomically decrement a counter column.
This method atomically decrements a counter column in
row
. If the counter column does not exist, it is automatically initialized to0
before being decremented.Parameters: Return type: Returns: Counter value after decrementing.
-
counter_get
(row, column)[source]# Retrieve the current value of a counter column.
This method retrieves the current value of a counter column. If the counter column does not exist, this function initializes it to
0
.Note
Application code should never store a counter value directly; use the atomic
counter_inc()
andcounter_dec()
methods for that.Parameters: Return type: Returns: Counter value (after initializing / incrementing by 0).
-
counter_inc
(row, column, value=1)[source]# Atomically increment a counter column.
This method atomically increments a counter column in
row
. If the counter column does not exist, it is automatically initialized to0
before being incremented.Parameters: Return type: Returns: Counter value after incrementing.
-
counter_set
(row, column, value=0)[source]# Set a counter column to a specific value.
Note
Be careful using this method. It can be useful for setting the initial value of a counter, but it defeats the purpose of using atomic increment and decrement.
Parameters:
-
delete
(row, columns=None, timestamp=None, wal=<object object>)[source]# Delete data from a row in this table.
This method deletes the entire
row
ifcolumns
is not specified.Note
This method will send a request with a single delete mutation. In many situations,
batch()
is a more appropriate method to manipulate data since it helps combine many mutations into a single request.Parameters: - row (str) – The row key where the delete will occur.
- columns (list) –
(Optional) Iterable containing column names (as strings). Each column name can be either
- an entire column family:
fam
orfam:
- a single column:
fam:col
- an entire column family:
- timestamp (int) – (Optional) Timestamp (in milliseconds since the epoch) that the mutation will be applied at.
- wal (object) – Unused parameter (to be passed to a created batch). Provided for compatibility with HappyBase, but irrelevant for Cloud Bigtable since it does not have a Write Ahead Log.
-
families
()[source]# Retrieve the column families for this table.
Return type: dict Returns: Mapping from column family name to garbage collection rule for a column family.
-
put
(row, data, timestamp=None, wal=<object object>)[source]# Insert data into a row in this table.
Note
This method will send a request with a single “put” mutation. In many situations,
batch()
is a more appropriate method to manipulate data since it helps combine many mutations into a single request.Parameters: - row (str) – The row key where the mutation will be “put”.
- data (dict) – Dictionary containing the data to be inserted. The keys
are columns names (of the form
fam:col
) and the values are strings (bytes) to be stored in those columns. - timestamp (int) – (Optional) Timestamp (in milliseconds since the epoch) that the mutation will be applied at.
- wal (object) – Unused parameter (to be passed to a created batch). Provided for compatibility with HappyBase, but irrelevant for Cloud Bigtable since it does not have a Write Ahead Log.
-
regions
()[source]# Retrieve the regions for this table.
Warning
Cloud Bigtable does not give information about how a table is laid out in memory, so this method does not work. It is provided simply for compatibility.
Raises: NotImplementedError
always
-
row
(row, columns=None, timestamp=None, include_timestamp=False)[source]# Retrieve a single row of data.
Returns the latest cells in each column (or all columns if
columns
is not specified). If atimestamp
is set, then latest becomes latest up untiltimestamp
.Parameters: - row (str) – Row key for the row we are reading from.
- columns (list) –
(Optional) Iterable containing column names (as strings). Each column name can be either
- an entire column family:
fam
orfam:
- a single column:
fam:col
- an entire column family:
- timestamp (int) – (Optional) Timestamp (in milliseconds since the epoch). If specified, only cells returned before the the timestamp will be returned.
- include_timestamp (bool) – Flag to indicate if cell timestamps should be included with the output.
Return type: Returns: Dictionary containing all the latest column values in the row.
-
rows
(rows, columns=None, timestamp=None, include_timestamp=False)[source]# Retrieve multiple rows of data.
All optional arguments behave the same in this method as they do in
row()
.Parameters: - rows (list) – Iterable of the row keys for the rows we are reading from.
- columns (list) –
(Optional) Iterable containing column names (as strings). Each column name can be either
- an entire column family:
fam
orfam:
- a single column:
fam:col
- an entire column family:
- timestamp (int) – (Optional) Timestamp (in milliseconds since the epoch). If specified, only cells returned before (or at) the timestamp will be returned.
- include_timestamp (bool) – Flag to indicate if cell timestamps should be included with the output.
Return type: Returns: A list of pairs, where the first is the row key and the second is a dictionary with the filtered values returned.
-
scan
(row_start=None, row_stop=None, row_prefix=None, columns=None, timestamp=None, include_timestamp=False, limit=None, **kwargs)[source]# Create a scanner for data in this table.
This method returns a generator that can be used for looping over the matching rows.
If
row_prefix
is specified, only rows with row keys matching the prefix will be returned. If given,row_start
androw_stop
cannot be used.Note
Both
row_start
androw_stop
can beNone
to specify the start and the end of the table respectively. If both are omitted, a full table scan is done. Note that this usually results in severe performance problems.The keyword argument
filter
is also supported (beyond column and row range filters supported here). HappyBase / HBase users will have used this as an HBase filter string. (See the Thrift docs for more details on those filters.) However, Google Cloud Bigtable doesn’t support those filter strings so aRowFilter
should be used instead.The arguments
batch_size
,scan_batching
andsorted_columns
are allowed (as keyword arguments) for compatibility with HappyBase. However, they will not be used in any way, and will cause a warning if passed. (Thebatch_size
determines the number of results to retrieve per request. The HBase scanner defaults to reading one record at a time, so this argument allows HappyBase to increase that number. However, the Cloud Bigtable API uses HTTP/2 streaming so there is no concept of a batched scan. Thesorted_columns
flag tells HBase to return columns in order, but Cloud Bigtable doesn’t have this feature.)Parameters: - row_start (str) – (Optional) Row key where the scanner should start
(includes
row_start
). If not specified, reads from the first key. If the table does not containrow_start
, it will start from the next key after it that is contained in the table. - row_stop (str) – (Optional) Row key where the scanner should stop
(excludes
row_stop
). If not specified, reads until the last key. The table does not have to containrow_stop
. - row_prefix (str) – (Optional) Prefix to match row keys.
- columns (list) –
(Optional) Iterable containing column names (as strings). Each column name can be either
- an entire column family:
fam
orfam:
- a single column:
fam:col
- an entire column family:
- timestamp (int) – (Optional) Timestamp (in milliseconds since the epoch). If specified, only cells returned before (or at) the timestamp will be returned.
- include_timestamp (bool) – Flag to indicate if cell timestamps should be included with the output.
- limit (int) – (Optional) Maximum number of rows to return.
- kwargs (dict) – Remaining keyword arguments. Provided for HappyBase compatibility.
Raises: If
limit
is set but non-positive, or ifrow_prefix
is used with row start/stop,TypeError
if a stringfilter
is used.- row_start (str) – (Optional) Row key where the scanner should start
(includes
-
google.cloud.happybase.table.
make_ordered_row
(sorted_columns, include_timestamp)[source]# Make a row dict for sorted Thrift column results from scans.
Warning
This method is only provided for HappyBase compatibility, but does not actually work.
Parameters: Raises: NotImplementedError
always
-
google.cloud.happybase.table.
make_row
(cell_map, include_timestamp)[source]# Make a row dict for a Thrift cell mapping.
Warning
This method is only provided for HappyBase compatibility, but does not actually work.
Parameters: Raises: NotImplementedError
always
HappyBase Batch#
Google Cloud Bigtable HappyBase batch module.
-
class
google.cloud.happybase.batch.
Batch
(table, timestamp=None, batch_size=None, transaction=False, wal=<object object>)[source]# Bases:
object
Batch class for accumulating mutations.
Note
When using a batch with
transaction=False
as a context manager (i.e. in awith
statement), mutations will still be sent as row mutations even if the context manager exits with an error. This behavior is in place to match the behavior in the HappyBase HBase / Thrift implementation.Parameters: - table (
Table
) – The table where mutations will be applied. - timestamp (int) – (Optional) Timestamp (in milliseconds since the epoch) that all mutations will be applied at.
- batch_size (int) – (Optional) The maximum number of mutations to allow to accumulate before committing them.
- transaction (bool) – Flag indicating if the mutations should be sent
transactionally or not. If
transaction=True
and an error occurs while aBatch
is active, then none of the accumulated mutations will be committed. Ifbatch_size
is set, the mutation can’t be transactional. - wal (object) – Unused parameter (Boolean for using the HBase Write Ahead Log). Provided for compatibility with HappyBase, but irrelevant for Cloud Bigtable since it does not have a Write Ahead Log.
Raises: TypeError
ifbatch_size
is set andtransaction=True
.ValueError
ifbatch_size
is not positive.-
delete
(row, columns=None, wal=<object object>)[source]# Delete data from a row in the table owned by this batch.
Parameters: - row (str) – The row key where the delete will occur.
- columns (list) –
(Optional) Iterable containing column names (as strings). Each column name can be either
- an entire column family:
fam
orfam:
- a single column:
fam:col
If not used, will delete the entire row.
- an entire column family:
- wal (object) – Unused parameter (to over-ride the default on the instance). Provided for compatibility with HappyBase, but irrelevant for Cloud Bigtable since it does not have a Write Ahead Log.
Raises: If the delete timestamp range is set on the current batch, but a full row delete is attempted.
-
put
(row, data, wal=<object object>)[source]# Insert data into a row in the table owned by this batch.
Parameters: - row (str) – The row key where the mutation will be “put”.
- data (dict) – Dictionary containing the data to be inserted. The keys
are columns names (of the form
fam:col
) and the values are strings (bytes) to be stored in those columns. - wal (object) – Unused parameter (to over-ride the default on the instance). Provided for compatibility with HappyBase, but irrelevant for Cloud Bigtable since it does not have a Write Ahead Log.
- table (
Getting started#
The google-cloud-happybase
library is pip
install-able:
$ pip install google-cloud-happybase