A database is a crucial engine for a world becoming more data driven. Businesses are more heavily relying on smart insights and emerging patterns to succeed.
Advancements in software and hardware had an interplay between the rising appetite for any organization making a data-driven decision. In this blog, one of the key inventors in modern databases helps illuminate the evolution of databases, business use cases and unique perspectives being at the ground floor of innovation.
Guest blogger: Chief Technology Officer and Co-Founder: Adam Prout
We founded MemSQL (the original name of SingleStore) in 2011. “Mem” signifies in-memory and “SQL” makes it clear that you could indeed achieve speed, scale, and SQL without giving up on the expressive power and advantages of relational algebra. Nikita Shamgunov (Co-Founder of SingleStore) and I were seeing the signals in the market. First there was a notion that relational databases could not scale to the speed of modern apps and thus NoSQL databases were rising in popularity. Some of the relational database vendors were slow to adapt during this period and some companies were starting to create specialized databases in-house to address the challenge. Secondly, hardware advancements that historically accelerated software workload by 50% each year was coming to an end in the early to mid 2000s.At this point, CPUs began to be built with more processing cores, but the increase in processing power of each core slowed dramatically. Databases designed to be fast on systems with a single core or only a few cores required substantial redesign to run well on machines with many cores. This required more innovation on the database software side. One of the key challenges in distributed scale-out databases included how to deploy many hosts built with high availability and elasticity while keeping the familiar SQL interface. This helps our customers mitigate the risks and costs of managing complex ecosystems of tooling built around the mostly single-host SQL database technologies that existed at the time.
Co-developing with customers in gaming, banking and ridesharing
We then started looking for customers struggling to scale their existing SQL databases. Around 2011, we worked with a hot gaming company with a real-time analytics use case to understand what their users were doing in the moment to optimize the gaming experience by monitoring how users interacted with the game. The gaming company was also looking for early warning signs in customer behavior that may indicate bugs or performance issues impacting the gaming experience. Previously, this customer had run this analytical workload on other types of databases, but the conventional databases didn’t handle it very well. The customer also attempted to run it in a data warehouse, which wasn’t good at low latency streaming data ingestion and low latency query support. They also tried to run it on an operational database but it didn’t have the right storage technology to efficiently run complex analytical queries. Our database addressed these challenges to meet their goals.
We built a lot of features working with the first gaming customer and now wanted to turn to driving revenues at an accelerated pace. There was a banking customer that had a set of requirements that were very similar. For example, their applications ingest market data, ticker traffic, internal data, and other proprietary data streams to support banking needs. Their existing databases in use could not support the low latency, high availability and high query concurrency needs for the end-user facing apps. Given that these apps and dashboards were built for their most important client segments, the high-net-worth investors with large portfolios with hundreds to thousands of positions, these apps could not go down or slow down as it can have serious consequences in their positions. Of course, their clients needed to see the live results such as the current market position and maintain the continuously updated view of order transactions using visualization of data feed.
Another example in the banking segment also combined these real-time streaming and analytic needs with elasticity and agility. This in-house application recommended how this bank should make trades. The use case of this bank required tracking of portfolio risk using high speed joins of position risk, index and fund composition, and other risk factors.
Interestingly these needs were not just for banking. Here is a rideshare company with a use case on real-time marketing segmentation and targeting application used by thousands of employees across the marketing, product, and leadership teams. SingleStore helps them provide detailed, real-time data on more than 300 different attributes across their rider and driver population. They can query things like behavior, cancellation rate, churn, days since last trip, etc., all with an average of 1 ms response time. They can see, by device, the language, location, status, and preferences of riders and drivers. If a driver has not taken a trip in the previous week, they can provide real-time incentives to those drivers to keep them on the road.
Technology underpinnings of blazingly fast databases
SingleStore has patented Universal Storage combining the qualities of rowstore and columnstore into one unique table type. That is, it has the very fast table scan performance of a columnstore and yet can support highly concurrent point read/writes with performance close to that of a rowstore. Our database also scales out horizontally using a distributed cluster architecture, providing high throughput and fast response time for query execution. Separation of storage and compute allows for cost savings as well as improved performance and elasticity. SingleStore can store data in blob storage without negatively impacting point read/write query performance like cloud data warehouses. This is why SingleStore’s single-core speed is 10x to 100x or more the speed of many legacy databases. It also maintains broad compatibility with the modern data processing ecosystem, accessible through standard SQL drivers and ANSI SQL syntax. SingleStore can all at once handle high throughput ingestion, low latency query, especially for streaming data, high performance read/write, scan, trickle inserts, and trickle deletes. We were excited to see our TPC benchmarking results and additional benchmarking tests.
It was the culmination of our effort as we pivoted to go beyond the in-memory databases i.e, toward building a more general-purpose distributed SQL database able to run a wide spectrum of workloads with strong performance. That’s when we iterated toward unified storage with an architecture that allows customers to run operational and analytical workloads simultaneously while scaling resources to power the next generation of data-intensive applications.
Adapting databases for hybrid clouds
We also started our managed service (cloud database-as-a-service offering) about 5 years ago and building features for the managed service was different. Our work to separate compute and storage with near unlimited blob storage was a table stake for the managed service. We had to rethink pricing and address the feature gaps. Existing databases tend to become silo databases within a particular cloud or even a particular region of a cloud. Instead of taking multiple copies of data and updating using different technologies, we want to help run this all transparently. This way, you can now take your data and run data processing tasks wherever it is convenient.
Nowadays, even large banking customers are interested in migrating off on-premises deployment and doubling down for the managed service deployment because their customers require the application to be in multiple regions in multiple clouds. We expect the demand for having a multicloud control plane for databases to become even higher.
Another key theme is using AI/ML within a database. Writing a tier 1 app is challenging for developers so how about making the database aspect much easier? Intelligent databases with AI and ML can help optimize and self-tune different aspects of a customer’s application without needing developer intervention.
Experience to share with fellow technologies
I built databases my whole career. I developed code, ran a team of engineers, and became the Chief Architect, and then the Chief Technology Officer. Databases offer a diversity of technical areas that are in constant flux. You need to know basic machine learning, statistics, scheduling algorithms, data structures, network protocols, query languages, and runtime designs – all these pieces of computer science including distributed systems. It is endlessly interesting – you can master one thing and move onto the next thing while looking at what the next thing could be.
Working on a database with a small group of people in a start-up is special. When you are working in a large company, the ways people react to these challenges are heavily colored by their existing position in the market and strongly held beliefs. Engineers don’t always get to work on the right problems.
You don’t know the right answer going in and you feel the raw market forces that are more directly applied to your product or company – you can directly contribute to driving the next evolution of the market. The unique experience of looking to solve some of the hardest and highest impact problems is almost impossible to get outside the start-up environment. Imagine David and Goliath! This is ultimate for an engineer aspiring to gain lots of growth opportunities and upside potential.
Looking ahead
We are excited to partner with IBM as it has relationships with customers in ways that go beyond our relationship. With SingleStore-IBM OEM partnership, businesses can enjoy the best of both worlds — the blazingly fast database from SingleStore and the global scale and deep-expertise in data and AI from IBM.
Source: ibm.com
0 comments:
Post a Comment