Browse our archives by topic…
Big Compute
Azure Synapse Analytics: How serverless is replacing the data warehouse
Serverless data architectures enable leaner data insights and operations. How do you reap the rewards while avoiding the potential pitfalls?
Benchmarking Azure Synapse Analytics - SQL Serverless, using .NET Interactive
There is a new service in town that promises to transform the way you query the contents of your data lake. Azure Synapse Analytics comes with a new offering called SQL Serverless allowing you to query your data on-demand with no need for pre-provisioned resources.When we heard about the new service we were keen to get involved, so for the last 10 months we've been working with the SQL Serverless product group to provide feedback on the service and to help ensure it meets our customers needs. During this time we've put it through it's paces by implementing a range of real-world use cases. We were particularly interested to see how it stacked up as a replacement for Data Lake Analytics, where to date there has been no clear and easy migration path.
Does Azure Synapse Analytics spell the end for Azure Databricks?
Have you or are you about to invest in Azure Databricks? If so, the new Spark offering in Azure Synapse Analytics is likely to have grabbed your attention and rightly so. Why is Microsoft putting yet another Spark offering on the table and what does it mean for you?
Recording of Azure Oxford talk on combatting illegal fishing with Azure (for less than £10/month)
Jess and Carmel recently gave a talk at Azure Oxford on "Combatting illegal fishing with Machine Learning and Azure - for less than £10 / month). The recording of that talk is now available for viewing!The talk focuses on the recent work we completed with OceanMind. They run through how to construct a cloud-first architecture based on serverless and data analytics technologies and explore the important principles and challenges in designing this kind of solution. Finally, we see how the architecture we designed through this process not only provides all the benefits of the cloud (reliability, scalability, security), but because of the pay-as-you-go compute model, has a compute cost that we could barely believe!
Building a proximity detection pipeline
At endjin, our approach focuses on using scientific experimental method to support the creation of fully proved and tested decision making, and the use of scientific research to support our work. This post runs through how we applied that process to creation a pipeline to detect vessel proximity.This is an example which is based around the project we recently worked on with OceanMind. In this project we helped them to build a #serverless architecture which could detect vessel proximity in close to real time. The vessel proximity events we detected were then fed into machine learning algorithms in order to detect illegal fishing!Carmel also runs through some of the actual calculations we used to detect proximity, how we used #data projections to efficiently process large quantitities of incoming data, and the use of #durablefunctions to orchestrate the processing.
Optimising C# for a serverless environment
In our recent project with OceanMind we used #AzureFunctions to process marine vessel telemetry from around the world. This involved processing huge quantities of data in close to real time. We optimised our processing for a serverless environment, the outcome of which being that the compute would cost less than £10 / month!This post summarises some of the techniques we used, including some concrete examples of optimisations we made.
Building a secure data solution using Azure Data Lake Store (Gen2)
In this blog from the Azure Advent Calendar 2019 we discuss building a secure data solution using Azure Data Lake. Data Lake has many features which enable fine grained security and data separation. It is also built on Azure Storage which enables us to take advantage of all of those features and means that ADLS is still a cost effective storage option!This post runs through some of the great features of ADLS and runs through an example of how we build our solutions using this technology!
Speaking at NDC London: Combatting illegal fishing with Machine Learning and Azure
In January 2020, Carmel is speaking about creating high performance geospatial algorithms in C# which can detect suspicious vessel activity, which is used to help alert law enforcement to illegal fishing. The input data is fed from Azure Data Lake Storage Gen 2, and converted into data projections optimised for high-performance computation. This code is then hosted in Azure Functions for cheap, consumption based processing.
C#, Span and async
The addition of ref struct types, most notably Span<T>, opened C# to a range of high performance scenarios that were impractical to tackle with earlier versions of the language. However, they introduce some challenges. For example, they do not mix very well with async methods. This article shows some techniques for mitigating this.
Increasing performance via low memory allocation in C#
We worked on a project recently which required us to build a highly performant system for processing vast quantities of messages in real time. We had made the decision to run this processing using Azure Functions with C#. This post runs through some of the techniques we used for writing highly performant, low allocation code, including data streaming, list preallocation and the relatively new C# feature: Span<T>.
Running Azure functions in Docker on a Raspberry Pi 4
For one of my first experiments with the Raspberry Pi 4, I decided to get an Azure Function running in a Docker container. This post gives a step-by-step guide on how to do it, as well as providing code you can use a starting point for your own experiments.
Import and export notebooks in Databricks
Sometimes it's necessary to import and export notebooks from a Databricks workspace. This might be because you have some generic notebooks that can be useful across numerous workspaces, or it could be that you're having to delete your current workspace for some reason and therefore need to transfer content over to a new workspace. Importing and exporting can be doing either manually or programmatically. In this blog, we outline a way to recursively export/import a directory and its files from/to a Databricks workspace.
Demystifying machine learning using neural networks
Machine learning often seems like a black box. This post walks through what's actually happening under the covers, in an attempt to de-mystify the process!Neural networks are built up of neurons. In a shallow neural network we have an input layer, a "hidden" layer of neurons, and an output layer. For deep learning, there is simply more hidden layers which allows for combining neuron's inputs and outputs to build up a more detailed picture.If you have an interest in Machine Learning and what is really happening, definitely give this a read (WARNING: Some algebra ahead...)!
Using Databricks Notebooks to run an ETL process
Here at endjin we've done a lot of work around data analysis and ETL. As part of this we have done some work with Databricks Notebooks on Microsoft Azure. Notebooks can be used for complex and powerful data analysis using Spark. Spark is a "unified analytics engine for big data and machine learning". It allows you to run data analysis workloads, and can be accessed via many APIs. This means that you can build up data processes and models using a language you feel comfortable with. They can also be run as an activity in a ADF pipeline, and combined with Mapping Data Flows to build up a complex ETL process which can be run via ADF.
Exploring Azure Data Factory - Mapping Data Flows
Mapping Data Flows are a relatively new feature of ADF. They allow you to visually build up complex data transformation sequences. This can aid in the streamlining of data manipulation and ETL processes, without the need to write any code! This post gives a brief introduction to the technology, and what this could enable!
Avoiding deployment locking errors by running Web and Functions Apps from packages
This post walks through the fix for DLL locking errors when trying to deploy an Azure Function. The solution was to switch over to the new "deploy from package" option when deploying the functions. This fixes the file locking problem because instead of deploying the DLLs, the function will run from a package file added to its directory.
A conversation about .NET, The Cloud, Data & AI, teaching software engineers and joining endjin with Ian Griffiths
When he joined endjin, Technical Fellow Ian sat down with founder Howard for a Q&A session. This was originally published on LinkedIn in 5 parts, but is republished here, in full. Ian talks about his path into computing, some highlights of his career, the evolution of the .NET ecosystem, AI, and the software engineering life.
Using Python inside SQL Server
Do you have a bunch of data in SQL Server that you're using ODBC/JDBC to pull data to work with in Python? Using SQL Server's Python integration, you can connect to a SQL Server instance within your preferred IDE and perform the computations on the SQL Server Machine. No more clunky data transferring. Operationalizing a Python model/script is as easy as calling a stored procedure. Any application that can speak to SQL Server can invoke the Python code and retrieve the results. Easy! This blog will provide a few, simple examples which make use of this capability to carry out some simple Python commands, so you can get up and running as quickly as possible.
Snap Back to Reality – Month 2 & 3 of my Apprenticeship
Learn what types of things an apprentice gets up to at endjin a few months after joining. You could be learning about Neural Networks: algorithms which mimic the way biological systems process information. You could be attending Microsoft's Future Decoded conference, learning about Bots, CosmosDB, IoT and much more. Hopefully, you wouldn't be in hospital after a ruptured appendix!
How to plan your cloud transformation journey
We've been helping customers adopt Microsoft Azure since 2010, we have produced a lot of thought leadership to help people think about the steps required, the risk involved and how to plan a successful adoption.
AWS vs Azure vs Google Cloud Platform - Mobile Services
AWS vs Azure vs Google Cloud Platform - Database
AWS vs Azure vs Google Cloud Platform - Compute
Embracing Disruption - Financial Services and the Microsoft Cloud
We have produced an insightful booklet called "Embracing Disruption - Financial Services and the Microsoft Cloud" which examines the challenges and opportunities for the Financial Service Industry in the UK, through the lens of Microsoft Azure, Security, Privacy & Data Sovereignty, Data Ingestion, Transformation & Enrichment, Big Compute, Big Data, Insights & Visualisation, Infrastructure, Ops & Support, and the API Economy.
Azure Batch - Time is Money in Big Compute
Consumption based pricing is a one of the USPs of Cloud PaaS services, but the default settings aren't necessarily optimised for cost. Significant savings can be made from understanding your workload.
Azure Machine Learning–experimenting with training data proportions using the SMOTE module
Spinning up 16,000 A1 Virtual Machines on Azure Batch
We recently completed a technical proof of concept to see if the new Azure Batch service could scale to meet the demands of a Big Compute workload.