When Data Reliability and Scale is Everything, Why We Turned to Open Source

When Data Reliability and Scale is Everything

For so many of today’s enterprises, ensuring data quality and then reliably delivering that data to clients and partners without interruption is foundational to trust. Our Nevada- headquartered real estate valuation company, Clear Capital, is squarely in this category. We provide property valuation solutions, and our vast data stores are at the core of the value we need to provide.

To keep up with the evolving needs of our clients – some of the world’s largest financial institutions – we had to build out a new technology platform robust enough to deliver increasingly data-intensive solutions and become a true competitive differentiator. Specifically, our platform needed to have requisite data availability and performance to meet strict client SLA requirements around uptime and latency, as well as the scalability to address both expected and unexpected growth.

Big Data – How Businesses Can Manage Data Aggregation Successfully

The organization needed to keep the costs reasonable while building a data stack. Our data availability SLAs hold us to a 99.92% uptime. This equates to mere hours of outage time per year. We also punch above our weight as a medium-sized enterprise serving global financial industry clients. Onboarding can require suddenly processing 100,000 additional property valuations every month – tremendously expanding the data volume in our systems.

So, we knew we’d need particularly cost-effective data technologies with the ability to scale instantly whenever needed. At the same time, from a client perspective, any missing or unavailable data could create instant doubt in the minds of property valuation professionals over whether what they’re seeing is a true and accurate picture. The data simply has to be there, all the time and every time.

Clear Capital used open-source and selected Apache Cassandra and Apache Solr as the core data-layer components for the new technology platform. These choices were prudent; the linearly-scalable, fault-tolerant NoSQL Cassandra database offered the scalability, high availability, and performance appropriate for alleviating our data concerns. Solr had the distributed indexing, replication, and load-balanced querying with similar high reliability and scalability proved out during our vetting process.

Data Analytics Giant Palantir Confidentially Files to go Public

We had internal knowledge of how to implement the development and operational aspects of our new technology platform. However, one strategy we decided on was enlisting outside the support that could better ensure a clean implementation of these open source technologies.

With these goals in mind, we initially made this shift over to the NoSQL database world via DataStax, a proprietary (open core) data solution built on Cassandra that leverages Solr for indexing. However, we soon discovered that our DataStax-provided data infrastructure didn’t quite have the flexibility that we truly required.

We were simultaneously gripped in a constant state of concern over the availability and scalability of data. At their root, these concerns were created by the need to further modernize our stack, embracing a cloud-first strategy and microservices architecture.

While DataStax could provide appropriate data access speeds, we were immediately rocked by scaling and unbalanced node issues. This was caused by our monolithic architecture, within which our databases and indices called the same EnterpriseDB and Oracle servers home. At the same time, our customers and our industry were signaling an appetite for quicker and better appraisal methods, which we had the data sets to provide looking forward.

Data Science – How It Is Shaping the Post-COVID Business Ecosystem

if we could just support them with the right infrastructure. It quickly became clear that we needed to achieve a cloud-based microservices architecture that allowed smoother node balancing and more rapid re-indexing. That infrastructure would enable us to be as dynamic as our customers’ requirements.

This next transition meant embracing another seismic change: a total shift to the cloud and AWS. To accomplish this, we also shifted away from DataStax and decided to tap open source data-layer provider Instaclustr.

This change was driven by their expertise and ability to extend that knowledge and confidence to our internal team, as well as Instaclustr being able to deliver those data technologies in their pure open-source form. By relying on non-commercialized open source, we could ensure our future freedom from vendor lock-in while maintaining crucial portability and control over our data.

For a company like Clear Capital, allowing an external partner to touch any data is a sensitive decision that cannot be made without tremendous trust. In our case, we were able to validate that trust early on, with our provider willing to guide the data migration from our previous proprietary solution to the non-proprietary version of Cassandra. This included a step-by-step migration plan, complete with detailed strategies for replicating active data and performing rollbacks if necessary.

Importance of distributed data storage and IoT for organizations during the pandemic

Ultimately, the completed migration resulted in a fully implemented Cassandra and Solr without any downtime. We’ve stuck with the managed data-layer strategy, which has enabled our internal teams to focus their activities on product development, more quickly delivering features and capabilities that positively impact customer experiences.

With open source Cassandra and Solr in place, we now benefit from data-layer technology that brings fully-reliable availability and straightforward (and lower cost) scalability to onboarding new clients of any size.

Gimmal Introduces File Analysis Solution for Unstructured Data

Our new platform stores two billion real estate valuations with a 7x increase in performance compared to what we had before and ensure that all valuations can be indexed for analysis in seconds.