HappyBase Package#

Google Cloud Bigtable HappyBase package.

This package is intended to emulate the HappyBase library using Google Cloud Bigtable as the backing store.

Differences in Public API#

Some concepts from HBase/Thrift do not map directly to the Cloud Bigtable API. As a result

  • Table.regions() could not be implemented since tables in Cloud Bigtable do not expose internal storage details
  • Connection.enable_table() does nothing since Cloud Bigtable has no concept of enabled/disabled
  • Connection.disable_table() does nothing since Cloud Bigtable has no concept of enabled/disabled
  • Connection.is_table_enabled() always returns True since Cloud Bigtable has no concept of enabled/disabled
  • Connection.compact_table() does nothing since Cloud Bigtable handles table compactions automatically and does not expose an API for it
  • The __version__ value for the HappyBase package is None. However, it’s worth nothing this implementation was based off HappyBase 0.9.

In addition, many of the constants from connection are specific to HBase and are defined as None in our module:

  • COMPAT_MODES
  • THRIFT_TRANSPORTS
  • THRIFT_PROTOCOLS
  • DEFAULT_HOST
  • DEFAULT_PORT
  • DEFAULT_TRANSPORT
  • DEFAULT_COMPAT
  • DEFAULT_PROTOCOL

Two of these DEFAULT_HOST and DEFAULT_PORT, are even imported in the main happybase package.

Finally, we do not provide the util module. Though it is public in the HappyBase library, it provides no core functionality.

API Behavior Changes#

  • Since there is no concept of an enabled / disabled table, calling Connection.delete_table() with disable=True can’t be supported. Using that argument will result in a warning.

  • The Connection constructor disables the use of several arguments and will print a warning if any of them are passed in as keyword arguments. The arguments are:

    • host
    • port
    • compat
    • transport
    • protocol
  • In order to make Connection compatible with Cloud Bigtable, we add a instance keyword argument to allow users to pass in their own Instance (which they can construct beforehand).

    For example:

    from google.cloud.bigtable.client import Client
    client = Client(project=PROJECT_ID, admin=True)
    instance = client.instance(instance_id, location_id)
    instance.reload()
    
    from google.cloud.happybase import Connection
    connection = Connection(instance=instance)
    
  • Any uses of the wal (Write Ahead Log) argument will result in a warning as well. This includes uses in:

  • When calling Connection.create_table(), the majority of HBase column family options cannot be used. Among

    • max_versions
    • compression
    • in_memory
    • bloom_filter_type
    • bloom_filter_vector_size
    • bloom_filter_nb_hashes
    • block_cache_enabled
    • time_to_live

    Only max_versions and time_to_live are availabe in Cloud Bigtable (as MaxVersionsGCRule and MaxAgeGCRule).

    In addition to using a dictionary for specifying column family options, we also accept instances of GarbageCollectionRule or subclasses.

  • Table.scan() no longer accepts the following arguments (which will result in a warning):

    • batch_size
    • scan_batching
    • sorted_columns
  • Using a HBase filter string in Table.scan() is not possible with Cloud Bigtable and will result in a TypeError. However, the method now accepts instances of RowFilter and subclasses.

  • Batch.delete() (and hence Table.delete()) will fail with a ValueError when either a row or column family delete is attempted with a timestamp. This is because the Cloud Bigtable API uses the DeleteFromFamily and DeleteFromRow mutations for these deletes, and neither of these mutations support a timestamp.

HappyBase Connection#

Google Cloud Bigtable HappyBase connection module.

class google.cloud.happybase.connection.Connection(timeout=None, autoconnect=True, table_prefix=None, table_prefix_separator='_', instance=None, **kwargs)[source]#

Bases: object

Connection to Cloud Bigtable backend.

Note

If you pass a instance, it will be Instance.copy()-ed before being stored on the new connection. This also copies the Client that created the Instance instance and the Credentials stored on the client.

The arguments host, port, compat, transport and protocol are allowed (as keyword arguments) for compatibility with HappyBase. However, they will not be used in any way, and will cause a warning if passed.

Parameters:
  • timeout (int) – (Optional) The socket timeout in milliseconds.
  • autoconnect (bool) – (Optional) Whether the connection should be open()-ed during construction.
  • table_prefix (str) – (Optional) Prefix used to construct table names.
  • table_prefix_separator (str) – (Optional) Separator used with table_prefix. Defaults to _.
  • instance (Instance) – (Optional) A Cloud Bigtable instance. The instance also owns a client for making gRPC requests to the Cloud Bigtable API. If not passed in, defaults to creating client with admin=True and using the timeout here for the timeout_seconds argument to the Client constructor. The credentials for the client will be the implicit ones loaded from the environment. Then that client is used to retrieve all the instances owned by the client’s project.
  • kwargs (dict) – Remaining keyword arguments. Provided for HappyBase compatibility.
close()[source]#

Close the underlying transport to Cloud Bigtable.

This method does nothing and is provided for compatibility.

static compact_table(name, major=False)[source]#

Compact the specified table.

Warning

Cloud Bigtable supports table compactions, it just doesn’t expose an API for that feature, so this method does nothing. It is provided simply for compatibility.

Parameters:
  • name (str) – The name of the table to compact.
  • major (bool) – Whether to perform a major compaction.
create_table(name, families)[source]#

Create a table.

Warning

The only column family options from HappyBase that are able to be used with Cloud Bigtable are max_versions and time_to_live.

Values in families represent column family options. In HappyBase, these are dictionaries, corresponding to the ColumnDescriptor structure in the Thrift API. The accepted keys are:

  • max_versions (int)
  • compression (str)
  • in_memory (bool)
  • bloom_filter_type (str)
  • bloom_filter_vector_size (int)
  • bloom_filter_nb_hashes (int)
  • block_cache_enabled (bool)
  • time_to_live (int)
Parameters:
  • name (str) – The name of the table to be created.
  • families (dict) –

    Dictionary with column family names as keys and column family options as the values. The options can be among

    • dict
    • GarbageCollectionRule
Raises:
  • TypeError – If families is not a dictionary.
  • ValueError – If families has no entries.
  • AlreadyExists – If creation fails due to an already existing table.
  • NetworkError – If creation fails for a reason other than table exists.
delete_table(name, disable=False)[source]#

Delete the specified table.

Parameters:
  • name (str) – The name of the table to be deleted. If table_prefix is set, a prefix will be added to the name.
  • disable (bool) – Whether to first disable the table if needed. This is provided for compatibility with HappyBase, but is not relevant for Cloud Bigtable since it has no concept of enabled / disabled tables.
static disable_table(name)[source]#

Disable the specified table.

Warning

Cloud Bigtable has no concept of enabled / disabled tables so this method does nothing. It is provided simply for compatibility.

Parameters:name (str) – The name of the table to be disabled.
static enable_table(name)[source]#

Enable the specified table.

Warning

Cloud Bigtable has no concept of enabled / disabled tables so this method does nothing. It is provided simply for compatibility.

Parameters:name (str) – The name of the table to be enabled.
static is_table_enabled(name)[source]#

Return whether the specified table is enabled.

Warning

Cloud Bigtable has no concept of enabled / disabled tables so this method always returns True. It is provided simply for compatibility.

Parameters:name (str) – The name of the table to check enabled / disabled status.
Return type:bool
Returns:The value True always.
open()[source]#

Open the underlying transport to Cloud Bigtable.

This method does nothing and is provided for compatibility.

table(name, use_prefix=True)[source]#

Table factory.

Parameters:
  • name (str) – The name of the table to be created.
  • use_prefix (bool) – Whether to use the table prefix (if any).
Return type:

Table

Returns:

Table instance owned by this connection.

tables()[source]#

Return a list of table names available to this connection.

Note

This lists every table in the instance owned by this connection, not every table that a given user may have access to.

Note

If table_prefix is set on this connection, only returns the table names which match that prefix.

Return type:list
Returns:List of string table names.

HappyBase Connection Pool#

Google Cloud Bigtable HappyBase pool module.

class google.cloud.happybase.pool.ConnectionPool(size, **kwargs)[source]#

Bases: object

Thread-safe connection pool.

Note

All keyword arguments are passed unmodified to the Connection constructor except for autoconnect. This is because the open / closed status of a connection is managed by the pool. In addition, if instance is not passed, the default / inferred instance is determined by the pool and then passed to each Connection that is created.

Parameters:
  • size (int) – The maximum number of concurrently open connections.
  • kwargs (dict) – Keyword arguments passed to Connection constructor.
Raises:

TypeError if size is non an integer. ValueError if size is not positive.

connection(*args, **kwds)[source]#

Obtain a connection from the pool.

Must be used as a context manager, for example:

with pool.connection() as connection:
    pass  # do something with the connection

If timeout is omitted, this method waits forever for a connection to become available from the local queue.

Yields an active Connection from the pool.

Parameters:timeout (int) – (Optional) Time (in seconds) to wait for a connection to open.
Raises:NoConnectionsAvailable if no connection can be retrieved from the pool before the timeout (only if a timeout is specified).
exception google.cloud.happybase.pool.NoConnectionsAvailable[source]#

Bases: exceptions.RuntimeError

Exception raised when no connections are available.

This happens if a timeout was specified when obtaining a connection, and no connection became available within the specified timeout.

HappyBase Table#

Google Cloud Bigtable HappyBase table module.

class google.cloud.happybase.table.Table(name, connection)[source]#

Bases: object

Representation of Cloud Bigtable table.

Used for adding data and

Parameters:
  • name (str) – The name of the table.
  • connection (Connection) – The connection which has access to the table.
batch(timestamp=None, batch_size=None, transaction=False, wal=<object object>)[source]#

Create a new batch operation for this table.

This method returns a new Batch instance that can be used for mass data manipulation.

Parameters:
  • timestamp (int) – (Optional) Timestamp (in milliseconds since the epoch) that all mutations will be applied at.
  • batch_size (int) – (Optional) The maximum number of mutations to allow to accumulate before committing them.
  • transaction (bool) – Flag indicating if the mutations should be sent transactionally or not. If transaction=True and an error occurs while a Batch is active, then none of the accumulated mutations will be committed. If batch_size is set, the mutation can’t be transactional.
  • wal (object) – Unused parameter (to be passed to the created batch). Provided for compatibility with HappyBase, but irrelevant for Cloud Bigtable since it does not have a Write Ahead Log.
Return type:

Batch

Returns:

A batch bound to this table.

cells(row, column, versions=None, timestamp=None, include_timestamp=False)[source]#

Retrieve multiple versions of a single cell from the table.

Parameters:
  • row (str) – Row key for the row we are reading from.
  • column (str) – Column we are reading from; of the form fam:col.
  • versions (int) – (Optional) The maximum number of cells to return. If not set, returns all cells found.
  • timestamp (int) – (Optional) Timestamp (in milliseconds since the epoch). If specified, only cells returned before (or at) the timestamp will be returned.
  • include_timestamp (bool) – Flag to indicate if cell timestamps should be included with the output.
Return type:

list

Returns:

List of values in the cell (with timestamps if include_timestamp is True).

counter_dec(row, column, value=1)[source]#

Atomically decrement a counter column.

This method atomically decrements a counter column in row. If the counter column does not exist, it is automatically initialized to 0 before being decremented.

Parameters:
  • row (str) – Row key for the row we are decrementing a counter in.
  • column (str) – Column we are decrementing a value in; of the form fam:col.
  • value (int) – Amount to decrement the counter by. (If negative, this is equivalent to increment.)
Return type:

int

Returns:

Counter value after decrementing.

counter_get(row, column)[source]#

Retrieve the current value of a counter column.

This method retrieves the current value of a counter column. If the counter column does not exist, this function initializes it to 0.

Note

Application code should never store a counter value directly; use the atomic counter_inc() and counter_dec() methods for that.

Parameters:
  • row (str) – Row key for the row we are getting a counter from.
  • column (str) – Column we are get-ing from; of the form fam:col.
Return type:

int

Returns:

Counter value (after initializing / incrementing by 0).

counter_inc(row, column, value=1)[source]#

Atomically increment a counter column.

This method atomically increments a counter column in row. If the counter column does not exist, it is automatically initialized to 0 before being incremented.

Parameters:
  • row (str) – Row key for the row we are incrementing a counter in.
  • column (str) – Column we are incrementing a value in; of the form fam:col.
  • value (int) – Amount to increment the counter by. (If negative, this is equivalent to decrement.)
Return type:

int

Returns:

Counter value after incrementing.

counter_set(row, column, value=0)[source]#

Set a counter column to a specific value.

Note

Be careful using this method. It can be useful for setting the initial value of a counter, but it defeats the purpose of using atomic increment and decrement.

Parameters:
  • row (str) – Row key for the row we are setting a counter in.
  • column (str) – Column we are setting a value in; of the form fam:col.
  • value (int) – Value to set the counter to.
delete(row, columns=None, timestamp=None, wal=<object object>)[source]#

Delete data from a row in this table.

This method deletes the entire row if columns is not specified.

Note

This method will send a request with a single delete mutation. In many situations, batch() is a more appropriate method to manipulate data since it helps combine many mutations into a single request.

Parameters:
  • row (str) – The row key where the delete will occur.
  • columns (list) –

    (Optional) Iterable containing column names (as strings). Each column name can be either

    • an entire column family: fam or fam:
    • a single column: fam:col
  • timestamp (int) – (Optional) Timestamp (in milliseconds since the epoch) that the mutation will be applied at.
  • wal (object) – Unused parameter (to be passed to a created batch). Provided for compatibility with HappyBase, but irrelevant for Cloud Bigtable since it does not have a Write Ahead Log.
families()[source]#

Retrieve the column families for this table.

Return type:dict
Returns:Mapping from column family name to garbage collection rule for a column family.
put(row, data, timestamp=None, wal=<object object>)[source]#

Insert data into a row in this table.

Note

This method will send a request with a single “put” mutation. In many situations, batch() is a more appropriate method to manipulate data since it helps combine many mutations into a single request.

Parameters:
  • row (str) – The row key where the mutation will be “put”.
  • data (dict) – Dictionary containing the data to be inserted. The keys are columns names (of the form fam:col) and the values are strings (bytes) to be stored in those columns.
  • timestamp (int) – (Optional) Timestamp (in milliseconds since the epoch) that the mutation will be applied at.
  • wal (object) – Unused parameter (to be passed to a created batch). Provided for compatibility with HappyBase, but irrelevant for Cloud Bigtable since it does not have a Write Ahead Log.
regions()[source]#

Retrieve the regions for this table.

Warning

Cloud Bigtable does not give information about how a table is laid out in memory, so this method does not work. It is provided simply for compatibility.

Raises:NotImplementedError always
row(row, columns=None, timestamp=None, include_timestamp=False)[source]#

Retrieve a single row of data.

Returns the latest cells in each column (or all columns if columns is not specified). If a timestamp is set, then latest becomes latest up until timestamp.

Parameters:
  • row (str) – Row key for the row we are reading from.
  • columns (list) –

    (Optional) Iterable containing column names (as strings). Each column name can be either

    • an entire column family: fam or fam:
    • a single column: fam:col
  • timestamp (int) – (Optional) Timestamp (in milliseconds since the epoch). If specified, only cells returned before the the timestamp will be returned.
  • include_timestamp (bool) – Flag to indicate if cell timestamps should be included with the output.
Return type:

dict

Returns:

Dictionary containing all the latest column values in the row.

rows(rows, columns=None, timestamp=None, include_timestamp=False)[source]#

Retrieve multiple rows of data.

All optional arguments behave the same in this method as they do in row().

Parameters:
  • rows (list) – Iterable of the row keys for the rows we are reading from.
  • columns (list) –

    (Optional) Iterable containing column names (as strings). Each column name can be either

    • an entire column family: fam or fam:
    • a single column: fam:col
  • timestamp (int) – (Optional) Timestamp (in milliseconds since the epoch). If specified, only cells returned before (or at) the timestamp will be returned.
  • include_timestamp (bool) – Flag to indicate if cell timestamps should be included with the output.
Return type:

list

Returns:

A list of pairs, where the first is the row key and the second is a dictionary with the filtered values returned.

scan(row_start=None, row_stop=None, row_prefix=None, columns=None, timestamp=None, include_timestamp=False, limit=None, **kwargs)[source]#

Create a scanner for data in this table.

This method returns a generator that can be used for looping over the matching rows.

If row_prefix is specified, only rows with row keys matching the prefix will be returned. If given, row_start and row_stop cannot be used.

Note

Both row_start and row_stop can be None to specify the start and the end of the table respectively. If both are omitted, a full table scan is done. Note that this usually results in severe performance problems.

The keyword argument filter is also supported (beyond column and row range filters supported here). HappyBase / HBase users will have used this as an HBase filter string. (See the Thrift docs for more details on those filters.) However, Google Cloud Bigtable doesn’t support those filter strings so a RowFilter should be used instead.

The arguments batch_size, scan_batching and sorted_columns are allowed (as keyword arguments) for compatibility with HappyBase. However, they will not be used in any way, and will cause a warning if passed. (The batch_size determines the number of results to retrieve per request. The HBase scanner defaults to reading one record at a time, so this argument allows HappyBase to increase that number. However, the Cloud Bigtable API uses HTTP/2 streaming so there is no concept of a batched scan. The sorted_columns flag tells HBase to return columns in order, but Cloud Bigtable doesn’t have this feature.)

Parameters:
  • row_start (str) – (Optional) Row key where the scanner should start (includes row_start). If not specified, reads from the first key. If the table does not contain row_start, it will start from the next key after it that is contained in the table.
  • row_stop (str) – (Optional) Row key where the scanner should stop (excludes row_stop). If not specified, reads until the last key. The table does not have to contain row_stop.
  • row_prefix (str) – (Optional) Prefix to match row keys.
  • columns (list) –

    (Optional) Iterable containing column names (as strings). Each column name can be either

    • an entire column family: fam or fam:
    • a single column: fam:col
  • timestamp (int) – (Optional) Timestamp (in milliseconds since the epoch). If specified, only cells returned before (or at) the timestamp will be returned.
  • include_timestamp (bool) – Flag to indicate if cell timestamps should be included with the output.
  • limit (int) – (Optional) Maximum number of rows to return.
  • kwargs (dict) – Remaining keyword arguments. Provided for HappyBase compatibility.
Raises:

If limit is set but non-positive, or if row_prefix is used with row start/stop, TypeError if a string filter is used.

google.cloud.happybase.table.make_ordered_row(sorted_columns, include_timestamp)[source]#

Make a row dict for sorted Thrift column results from scans.

Warning

This method is only provided for HappyBase compatibility, but does not actually work.

Parameters:
  • sorted_columns (list) – List of TColumn instances from Thrift.
  • include_timestamp (bool) – Flag to indicate if cell timestamps should be included with the output.
Raises:

NotImplementedError always

google.cloud.happybase.table.make_row(cell_map, include_timestamp)[source]#

Make a row dict for a Thrift cell mapping.

Warning

This method is only provided for HappyBase compatibility, but does not actually work.

Parameters:
  • cell_map (dict) – Dictionary with fam:col strings as keys and TCell instances as values.
  • include_timestamp (bool) – Flag to indicate if cell timestamps should be included with the output.
Raises:

NotImplementedError always

HappyBase Batch#

Google Cloud Bigtable HappyBase batch module.

class google.cloud.happybase.batch.Batch(table, timestamp=None, batch_size=None, transaction=False, wal=<object object>)[source]#

Bases: object

Batch class for accumulating mutations.

Note

When using a batch with transaction=False as a context manager (i.e. in a with statement), mutations will still be sent as row mutations even if the context manager exits with an error. This behavior is in place to match the behavior in the HappyBase HBase / Thrift implementation.

Parameters:
  • table (Table) – The table where mutations will be applied.
  • timestamp (int) – (Optional) Timestamp (in milliseconds since the epoch) that all mutations will be applied at.
  • batch_size (int) – (Optional) The maximum number of mutations to allow to accumulate before committing them.
  • transaction (bool) – Flag indicating if the mutations should be sent transactionally or not. If transaction=True and an error occurs while a Batch is active, then none of the accumulated mutations will be committed. If batch_size is set, the mutation can’t be transactional.
  • wal (object) – Unused parameter (Boolean for using the HBase Write Ahead Log). Provided for compatibility with HappyBase, but irrelevant for Cloud Bigtable since it does not have a Write Ahead Log.
Raises:

TypeError if batch_size is set and transaction=True. ValueError if batch_size is not positive.

delete(row, columns=None, wal=<object object>)[source]#

Delete data from a row in the table owned by this batch.

Parameters:
  • row (str) – The row key where the delete will occur.
  • columns (list) –

    (Optional) Iterable containing column names (as strings). Each column name can be either

    • an entire column family: fam or fam:
    • a single column: fam:col

    If not used, will delete the entire row.

  • wal (object) – Unused parameter (to over-ride the default on the instance). Provided for compatibility with HappyBase, but irrelevant for Cloud Bigtable since it does not have a Write Ahead Log.
Raises:

If the delete timestamp range is set on the current batch, but a full row delete is attempted.

put(row, data, wal=<object object>)[source]#

Insert data into a row in the table owned by this batch.

Parameters:
  • row (str) – The row key where the mutation will be “put”.
  • data (dict) – Dictionary containing the data to be inserted. The keys are columns names (of the form fam:col) and the values are strings (bytes) to be stored in those columns.
  • wal (object) – Unused parameter (to over-ride the default on the instance). Provided for compatibility with HappyBase, but irrelevant for Cloud Bigtable since it does not have a Write Ahead Log.
send()[source]#

Send / commit the batch of mutations to the server.

Getting started#

The google-cloud-happybase library is pip install-able:

$ pip install google-cloud-happybase