HappyBase Package#

Google Cloud Bigtable HappyBase package.

This package is intended to emulate the HappyBase library using Google Cloud Bigtable as the backing store.

Differences in Public API#

Some concepts from HBase/Thrift do not map directly to the Cloud Bigtable API. As a result

  • Table.regions() could not be implemented since tables in Cloud Bigtable do not expose internal storage details
  • Connection.enable_table() does nothing since Cloud Bigtable has no concept of enabled/disabled
  • Connection.disable_table() does nothing since Cloud Bigtable has no concept of enabled/disabled
  • Connection.is_table_enabled() always returns True since Cloud Bigtable has no concept of enabled/disabled
  • Connection.compact_table() does nothing since Cloud Bigtable handles table compactions automatically and does not expose an API for it
  • The __version__ value for the HappyBase package is None. However, it’s worth nothing this implementation was based off HappyBase 0.9.

In addition, many of the constants from connection are specific to HBase and are defined as None in our module:

  • COMPAT_MODES
  • THRIFT_TRANSPORTS
  • THRIFT_PROTOCOLS
  • DEFAULT_HOST
  • DEFAULT_PORT
  • DEFAULT_TRANSPORT
  • DEFAULT_COMPAT
  • DEFAULT_PROTOCOL

Two of these DEFAULT_HOST and DEFAULT_PORT, are even imported in the main happybase package.

Finally, we do not provide the util module. Though it is public in the HappyBase library, it provides no core functionality.

API Behavior Changes#

  • Since there is no concept of an enabled / disabled table, calling Connection.delete_table() with disable=True can’t be supported. Using that argument will result in a warning.

  • The Connection constructor disables the use of several arguments and will print a warning if any of them are passed in as keyword arguments. The arguments are:

    • host
    • port
    • compat
    • transport
    • protocol
  • In order to make Connection compatible with Cloud Bigtable, we add a instance keyword argument to allow users to pass in their own Instance (which they can construct beforehand).

    For example:

    from google.cloud.bigtable.client import Client
    client = Client(project=PROJECT_ID, admin=True)
    instance = client.instance(instance_id, location_id)
    instance.reload()
    
    from google.cloud.happybase import Connection
    connection = Connection(instance=instance)
    
  • Any uses of the wal (Write Ahead Log) argument will result in a warning as well. This includes uses in:

  • When calling Connection.create_table(), the majority of HBase column family options cannot be used. Among

    • max_versions
    • compression
    • in_memory
    • bloom_filter_type
    • bloom_filter_vector_size
    • bloom_filter_nb_hashes
    • block_cache_enabled
    • time_to_live

    Only max_versions and time_to_live are availabe in Cloud Bigtable (as MaxVersionsGCRule and MaxAgeGCRule).

    In addition to using a dictionary for specifying column family options, we also accept instances of GarbageCollectionRule or subclasses.

  • Table.scan() no longer accepts the following arguments (which will result in a warning):

    • batch_size
    • scan_batching
    • sorted_columns
  • Using a HBase filter string in Table.scan() is not possible with Cloud Bigtable and will result in a TypeError. However, the method now accepts instances of RowFilter and subclasses.

  • Batch.delete() (and hence Table.delete()) will fail with a ValueError when either a row or column family delete is attempted with a timestamp. This is because the Cloud Bigtable API uses the DeleteFromFamily and DeleteFromRow mutations for these deletes, and neither of these mutations support a timestamp.