1 of 3

Kafka Topic

Overview

Apache Kafka is an end-to-end event streaming platform that:

Publishes (writes) and subscribes to (reads) streams of events from sources like databases, cloud services, and software applications.
Stores these events durably and reliably for as long as you want.
Processes and reacts to the event streams in real-time and retrospectively.

Those events are organized and durably stored in topics. These topics are then partitioned over a number of buckets located on different Kafka brokers.

Event streaming thus ensures a continuous flow and interpretation of data so that the right information is at the right place, at the right time for your key use cases.

Example use case

You currently use Kafka to store the metrics for user logins, but being stuck in the Kafka silo means that you can't easily use this data across a range of business use cases or teams. You can use a batch sync to liberate your data into Cinchy.

The Kafka Topic source supports real-time syncs.

Info tab

You can find the parameters in the Info tab below (Image 1).

Values

Source tab

The following table outlines the mandatory and optional parameters you will find on the Source tab (Image 2).

The following parameters will help to define your data sync source and how it functions.

To set up a real-time sync, you must configure your Listener values. You can do so through the Connections UI.

Note that If there is more than one listener associated with your data sync, you will need to configure the addition listeners via the Listener Configuration table.

Reset behaviour

Topic JSON

The below table can be used to help create your Topic JSON needed to set up a real-time sync.

Example Topic JSON

{
  "topicName": "<(mandatory) kafka topic name to listen messages on>",
  "messageFormat": "<(optional) Put "AVRO" if the messages are serialized in AVRO>"
}

Connection attributes

The below table can be used to help create your Connection Attributes JSON needed to set up a real-time sync.

{
  "bootstrapServers": "< (mandatory) kafka bootstrap servers in a comma-separated list in the form of host:port>",
  "saslMechanism": "<PLAIN|SCRAM-SHA-256|SCRAM-SHA-512>",
  "saslPassword": "",
  "saslUsername": "",
  "schemaRegistrySettings": {
    "url": "<(optional) This is required if your data follows a schema when serialized in Avro. A comma-separated list of URLs for schema registry instances that are used to register or lookup schemas. >",
   "basicAuthCredentialsSource": "<(optional) Specifies the Kafka configuration property "schema.registry.basic.auth.credentials.source" that provides the basic authentication credentials, this can be "UserInfo" | "SaslInherit">",
   "basicAuthUserInfo": "<(optional) Basic Auth credentials specified in the form of username:password>",
   "sslKeystorePassword": "<(optional) The client keystore (PKCS#12) password>"
  }
  "securityProtocol": "Plaintext | SaslPlaintext | SaslSsl"
}

The Schema section is where you define which source columns you want to sync in your connection. You can repeat the values for multiple columns.

Select Show Advanced for more options for the Schema section.

You can choose to add in a Transformation > String Replacement by inputting the following:

Note that you can have more than one String Replacement

You have the option to add a source filter to your data sync. Please review the documentation here for more information on source filters.

Next steps

Configure your Destination
Define your Sync Actions.
Add in your Post Sync Scripts, if required.
If more than one listener is needed for a real-time sync, configure it/them via the Listener Config table.
To run a real-time sync, enable your Listener from the Execution tab.

Kafka Topic example config

Overview

This example syncs from a Kafka Topic source to a Cinchy Table target.

We want to sync the following data from Kafka and map it to the appropriate column in the "Sync Target 2" table in the "Kafka Sync" domain.

UI Example

This is what the Connections UI will look like with the aforementioned example parameters and data.

Source tab

Your source tab should be set to "Kafka Topic" and have the following information (Image 1):

Tip: Click on an image in this document to enlarge it.

Destination tab

Your destination tab should be set to Cinchy Table, and have the following information (Image 2):

Domain: The domain where your destination table resides. This example uses the "Kafka Sync" domain.

Table: The name of your destination table. This example uses the "Sync Target 2" table.

Degree of Parallelism: This is the number of parallel batch inserts and updates that can be run. Set this to 1 for our example.

Sync behaviour

Under the Sync Behaviour tab, we want to use the following parameters:

Synchronization Pattern: Full File
Sync Key Column Reference Name: Employee Id
New Record Behaviour: Insert
Dropped Record Behaviour: Delete
Change Record Behavior: Update

XML example

The following code is what the XML for our example connection would look like:

<?xml version="1.0" encoding="utf-16"?>
<BatchDataSyncConfig name="Kafka Sync" version="1.0.0" xmlns="http://www.cinchy.co">
    <KafkaTopicDataSource>
        <Schema>
            <Column name="$.employeeId" label="Employee Id" dataType="Number" isMandatory="false" validateData="false"/>
            <Column name="$.name" label="Name" dataType="Text" trimWhitespace="true" isMandatory="false" validateData="false"/>
        </Schema>
    </KafkaTopicDataSource>
    <CinchyTableTarget reconcileData="true" domain="Kafka Sync" table="Sync Target 2" suppressDuplicateErrors="false" degreeOfParallelism="1">
        <ColumnMappings>
            <ColumnMapping sourceColumn="$.employeeId" targetColumn="Employee Id"/>
            <ColumnMapping sourceColumn="$.name" targetColumn="Name"/>
        </ColumnMappings>
        <SyncKey>
            <SyncKeyColumnReference name="Employee Id"/>
        </SyncKey>
        <NewRecordBehaviour type="INSERT"/>
        <DroppedRecordBehaviour type="DELETE"/>
        <ChangedRecordBehaviour type="UPDATE"/>
        <PostSyncScripts/>
    </CinchyTableTarget>
</BatchDataSyncConfig>

Apache AVRO data format

Introduction

Apache AVRO was added as an inbound data format in Cinchy v5.3.

Apache AVRO (inbound) is a data format with added integration with the Kafka Schema Registry, which helps enforce data governance within a Kafka architecture.

Avro is an open source data serialization system that helps with data exchange between systems, programming languages, and processing frameworks. Avro stores both the data definition and the data together in one message or file. Avro stores the data definition in JSON format making it easy to read and interpret; the data itself is stored in binary format making it compact and efficient.

Some of the benefits for using AVRO as a data format are:

It's compact
It has a direct mapping to/from JSON
It's fast
It has bindings for a wide variety of programming languages.

For more about AVRO and Kafka, read the documentation here.

Setting up the Connection to a Kafka Schema Registry

To set up the Apache AVRO connection to a Kafka Schema Registry, you will need to configure your Listener Configs table with the below specified attributes.

Listener Config Attributes

Topic Column

{ 
"topicName": "sampleavro", 
"messageFormat": "AVRO" 
}

Connections Attributes Column

{
  "bootstrapServers": "host:port",
  "schemaRegistrySettings": {
    "url": "URL"
    "basicAuthCredentialsSource": "UserInfo" | "SaslInherit",
    "basicAuthUserInfo": "username:password",
    "sslKeystorePassword": "password"
  }
}

Kafka Topic

Overview

Apache Kafka is an end-to-end event streaming platform that:

Publishes (writes) and subscribes to (reads) streams of events from sources like databases, cloud services, and software applications.
Stores these events durably and reliably for as long as you want.
Processes and reacts to the event streams in real-time and retrospectively.

Those events are organized and durably stored in topics. These topics are then partitioned over a number of buckets located on different Kafka brokers.

Event streaming thus ensures a continuous flow and interpretation of data so that the right information is at the right place, at the right time for your key use cases.

Example use case

The Kafka Topic source supports real-time syncs.

Info tab

You can find the parameters in the Info tab below (Image 1).

Values

Parameter

Description

Example

Source tab

The following table outlines the mandatory and optional parameters you will find on the Source tab (Image 2).

The following parameters will help to define your data sync source and how it functions.

Parameter

Description

Example

To set up a real-time sync, you must configure your Listener values. You can do so through the Connections UI.

Note that If there is more than one listener associated with your data sync, you will need to configure the addition listeners via the Listener Configuration table.

Reset behaviour

Parameter

Description

Example

Topic JSON

The below table can be used to help create your Topic JSON needed to set up a real-time sync.

Parameter

Description

Example

Example Topic JSON

{
  "topicName": "<(mandatory) kafka topic name to listen messages on>",
  "messageFormat": "<(optional) Put "AVRO" if the messages are serialized in AVRO>"
}

Connection attributes

The below table can be used to help create your Connection Attributes JSON needed to set up a real-time sync.

Parameter

Description

{
  "bootstrapServers": "< (mandatory) kafka bootstrap servers in a comma-separated list in the form of host:port>",
  "saslMechanism": "<PLAIN|SCRAM-SHA-256|SCRAM-SHA-512>",
  "saslPassword": "",
  "saslUsername": "",
  "schemaRegistrySettings": {
    "url": "<(optional) This is required if your data follows a schema when serialized in Avro. A comma-separated list of URLs for schema registry instances that are used to register or lookup schemas. >",
   "basicAuthCredentialsSource": "<(optional) Specifies the Kafka configuration property "schema.registry.basic.auth.credentials.source" that provides the basic authentication credentials, this can be "UserInfo" | "SaslInherit">",
   "basicAuthUserInfo": "<(optional) Basic Auth credentials specified in the form of username:password>",
   "sslKeystorePassword": "<(optional) The client keystore (PKCS#12) password>"
  }
  "securityProtocol": "Plaintext | SaslPlaintext | SaslSsl"
}

The Schema section is where you define which source columns you want to sync in your connection. You can repeat the values for multiple columns.

Parameter

Description

Example

Select Show Advanced for more options for the Schema section.

Parameter

Description

Example

You can choose to add in a Transformation > String Replacement by inputting the following:

Parameter

Description

Example

Note that you can have more than one String Replacement

You have the option to add a source filter to your data sync. Please review the documentation here for more information on source filters.

Next steps

Configure your Destination
Define your Sync Actions.
Add in your Post Sync Scripts, if required.
If more than one listener is needed for a real-time sync, configure it/them via the Listener Config table.
To run a real-time sync, enable your Listener from the Execution tab.

Apache AVRO data format

Introduction

Apache AVRO was added as an inbound data format in Cinchy v5.3.

Apache AVRO (inbound) is a data format with added integration with the Kafka Schema Registry, which helps enforce data governance within a Kafka architecture.

Some of the benefits for using AVRO as a data format are:

It's compact
It has a direct mapping to/from JSON
It's fast
It has bindings for a wide variety of programming languages.

For more about AVRO and Kafka, read the documentation here.

Setting up the Connection to a Kafka Schema Registry

To set up the Apache AVRO connection to a Kafka Schema Registry, you will need to configure your Listener Configs table with the below specified attributes.

Listener Config Attributes

Topic Column

Name

Description

{ 
"topicName": "sampleavro", 
"messageFormat": "AVRO" 
}

Connections Attributes Column

Name

Description

{
  "bootstrapServers": "host:port",
  "schemaRegistrySettings": {
    "url": "URL"
    "basicAuthCredentialsSource": "UserInfo" | "SaslInherit",
    "basicAuthUserInfo": "username:password",
    "sslKeystorePassword": "password"
  }
}

Kafka Topic example config

Overview

This example syncs from a Kafka Topic source to a Cinchy Table target.

We want to sync the following data from Kafka and map it to the appropriate column in the "Sync Target 2" table in the "Kafka Sync" domain.

Kafka Source

Cinchy Column

UI Example

This is what the Connections UI will look like with the aforementioned example parameters and data.

Source tab

Your source tab should be set to "Kafka Topic" and have the following information (Image 1):

Tip: Click on an image in this document to enlarge it.

Column 1 (Standard Column) Parameters

Example Data

Column 2 ( Standard Column) Parameters

Example Data

Destination tab

Your destination tab should be set to Cinchy Table, and have the following information (Image 2):

Domain: The domain where your destination table resides. This example uses the "Kafka Sync" domain.

Table: The name of your destination table. This example uses the "Sync Target 2" table.

Degree of Parallelism: This is the number of parallel batch inserts and updates that can be run. Set this to 1 for our example.

Column 1 (Standard Column) Parameters

Example Data

Column 2 (Standard Column) Parameters

Example Data

Sync behaviour

Under the Sync Behaviour tab, we want to use the following parameters:

Synchronization Pattern: Full File
Sync Key Column Reference Name: Employee Id
New Record Behaviour: Insert
Dropped Record Behaviour: Delete
Change Record Behavior: Update

XML example

The following code is what the XML for our example connection would look like:

<?xml version="1.0" encoding="utf-16"?>
<BatchDataSyncConfig name="Kafka Sync" version="1.0.0" xmlns="http://www.cinchy.co">
    <KafkaTopicDataSource>
        <Schema>
            <Column name="$.employeeId" label="Employee Id" dataType="Number" isMandatory="false" validateData="false"/>
            <Column name="$.name" label="Name" dataType="Text" trimWhitespace="true" isMandatory="false" validateData="false"/>
        </Schema>
    </KafkaTopicDataSource>
    <CinchyTableTarget reconcileData="true" domain="Kafka Sync" table="Sync Target 2" suppressDuplicateErrors="false" degreeOfParallelism="1">
        <ColumnMappings>
            <ColumnMapping sourceColumn="$.employeeId" targetColumn="Employee Id"/>
            <ColumnMapping sourceColumn="$.name" targetColumn="Name"/>
        </ColumnMappings>
        <SyncKey>
            <SyncKeyColumnReference name="Employee Id"/>
        </SyncKey>
        <NewRecordBehaviour type="INSERT"/>
        <DroppedRecordBehaviour type="DELETE"/>
        <ChangedRecordBehaviour type="UPDATE"/>
        <PostSyncScripts/>
    </CinchyTableTarget>
</BatchDataSyncConfig>