Alex Mboutchouang

Understanding OData Filtering: A Comprehensive Guide

Alex Mboutchouang — Wed, 27 Nov 2024 07:00:33 GMT

Filtering data efficiently is a fundamental of modern web development, especially when dealing with large datasets. In SharePoint, OData filtering is essential for querying and manipulating data from lists and libraries. But what exactly is OData filtering, where does it come from, and when should you use it? Let’s dive in.

The Lists Used in Our Examples

To illustrate how OData filtering works, we'll use an Project Management application scenario. The app has three SharePoint lists: Users, Projects, and Tasks. Each list is structured with attributes that reflect real-world relationships in project management workflows. Here's a detailed overview of these lists:

Users List

This list contains information about the users involved in the project.

Attributes:

Id (Number): A unique identifier for each user.
FullName (String): The full name of the user.
Email (String): The user's email address.
IsActive (Boolean): Whether the User is active or not.
Role (Choice): The role of the user (e.g., Manager, Developer, Tester).

Projects List

This list stores details about ongoing and completed projects.

Attributes:

Id (Number): A unique identifier for each project.
Title (String): The name of the project.
Budget (Number): The allocated budget for the project.
StartDate (DateTime): The project's starting date.
EndDate (DateTime): The project's planned or actual ending date.
Manager (Lookup): A lookup field referencing the Users list to indicate the project manager.
Status (Choice): The project's current status (Active, Completed, On Hold).

Tasks List

This list tracks tasks associated with projects.

Attributes:

Id (Number): A unique identifier for each task.
Title (String): A brief title for the task.
Description (String): A detailed description of the task.
DueDate (DateTime): The deadline for task completion.
Priority (Choice): The task's priority level (Low, Medium, High).
Status (Choice): The current task status (Not Started, In Progress, Completed).
AssignedTo (Lookup): A lookup field referencing the Users list for task assignment.
Project (Lookup): A lookup field referencing the Projects list to indicate the associated project.
Tags (Multi-Choice or Managed Metadata): Keywords or categories that can be used to classify tasks(Documentation, Development, Testing).

Relationships Between the Lists

These lists are interlinked to reflect the relationships in a project management workflow:

Each Project is managed by a user from the Users list.
Tasks in the Tasks list are assigned to users and associated with specific projects.

Using this structure, you can create powerful queries to filter and retrieve data relevant to your application's needs. Let's now explore how OData filtering operates.

What is OData?

OData, short for Open Data Protocol, is a standardized protocol designed to query and manipulate data over RESTful APIs. Developed by Microsoft, it allows applications to interact with various data sources such as databases, file systems, and web services using a common query language.

OData is built on widely-used web standards such as:

HTTP for transport.
JSON or XML for payload formatting
REST principles for resource interaction.

Think of OData as SQL for web APIs, but much more versatile and designed for distributed systems.

How Does OData Filtering Work in SharePoint?

In SharePoint, the REST API leverages OData to retrieve data from lists, libraries, and other resources. Filtering using OData allows developers to specify criteria to narrow down results, reducing the amount of data transferred and improving performance.

An OData filter is essentially a query string appended to the endpoint URL. For example:

/_api/web/lists/getbytitle('Tasks')/items?$filter=Status eq 'Completed'

Available Data Types in OData Filtering

When working with OData, understanding the data types is essential as they determine what operations can be performed. SharePoint supports several data types, each with unique characteristics:

String

Examples: Titles, names, or descriptions ("Task Title", "John Doe").

Common Operations:

eq, ne: Equality and inequality.
startswith, endswith: Check prefixes or suffixes.
substringof: Partial matching.

Example:

/_api/web/lists/getbytitle('Projects')/items?$filter=startswith(Title, 'Test')

Number

Examples: Integers and decimals used for IDs, counts, or numeric fields.

Common Operations:

eq, ne: Equality and inequality.
gt, ge, lt, le: Comparisons.

Example:

/_api/web/lists/getbytitle('Projects')/items?$filter=Budget gt 5000

Boolean

Examples: Yes/No fields like IsActive.

Common Operations:

eq: Equality (true or false).

Example:

/_api/web/lists/getbytitle('Users')/items?$filter=IsActive eq true

DateTime

Examples: Created dates, modified dates, or custom date fields.

Common Operations:

gt, ge, lt, le: Compare dates.

Example:

/_api/web/lists/getbytitle('Projects')/items?$filter=StartDate ge datetime'2024-01-01'
Or
/_api/web/lists/getbytitle('Projects')/items?$filter=StartDate ge '2024-01-01T10:30:0000Z'

Lookup Fields

Examples: Fields referencing other lists or users (e.g., Author/Id).

Common Operations:

Use navigation properties with / to filter related data.

Example:

/_api/web/lists/getbytitle('Tasks')/items?$filter=AssignedTo/Id eq 10

Choice and Managed Metadata

Examples: Choice fields like Status (Completed, In Progress) or managed metadata tags.

Common Operations:

eq, ne: Check specific values.

Example:

/_api/web/lists/getbytitle('Tasks')/items?$filter=Status eq 'Completed'

GUID

Examples: Unique identifiers for items or lists.

Common Operations:

eq, ne: Match specific GUIDs.

Example:

/_api/web/lists/getbytitle('Projects')/items?$filter=UniqueId eq guid'12345678-1234-1234-1234-123456789abc'
Or
/_api/web/lists/getbytitle('Projects')/items?$filter=UniqueId eq '12345678-1234-1234-1234-123456789abc'

Possible Operations in OData Filtering

OData filtering offers a wide range of operations that allow developers to perform complex queries. Here’s a detailed breakdown:

Comparison Operators

These operators compare values:

eq: Equal to.
ne: Not equal to.
gt, ge: Greater than, greater than or equal.
lt, le: Less than, less than or equal.

Logical Operators

They are used to combine multiple conditions:

and: Both conditions must be true.
or: At least one condition must be true.
not: Negates a condition.

String Functions

Operate on text fields:

startswith(Field, 'value'): Checks if the field starts with a value.
endswith(Field, 'value'): Checks if the field ends with a value.
substringof('value', Field): Checks if the field contains a value.

Arithmetic Operators

Perform calculations on numeric fields:

add, sub, mul, div, mod: Arithmetic operations.

Date Functions

Handle date-specific queries:

Compare date fields with operators (gt, lt, etc.).

Null Checks

Identify null or undefined values:

eq null, ne null: Check for nulls.

Query-related or nested fields:

Use / to navigate lookup fields.

Collections

Operate on fields that store multiple values (e.g., multi-choice fields):

any, all: Apply conditions on collections.

Example:

/_api/web/lists/getbytitle('Tasks')/items?$filter=Tags/any(tag: tag eq 'Development')
Or
/_api/web/lists/getbytitle('Tasks')/items?$filter=Tags/all(tag: tag eq 'Testing')

Benefits of Using OData Filtering

Efficiency: Reduces the data the client fetches, saving bandwidth and processing time.
Standardization: Uses a well-documented standard, making it easy to learn and implement across different APIs.
Flexibility: Allows fine-grained control over what data is retrieved, including relationships and nested properties.
Improved performance: Queries are executed server-side, minimizing client-side processing.

Drawbacks of OData Filtering

Learning Curve: For developers unfamiliar with OData syntax, learning and adapting can take some time.
Limited Debugging: Troubleshooting complex queries can be tricky since errors may not always provide detailed insights.
Scalability Concerns: Overly complex queries with multiple joins or filters can impact server performance.
Compatibility Issues: While OData is standardized, specific implementations (like SharePoint's REST API) might have limitations or variations.

When to Use OData Filtering

Use OData Filtering When:

You need to retrieve specific subsets of data from large lists or libraries.
You want to improve application performance by reducing data transfer.
You are working with dynamic filtering options, such as user-specific queries.

Avoid OData Filtering If:

The dataset is small and can be fetched entirely without performance concerns.
The filter conditions are highly complex and can be handled more efficiently on the client side or through alternative APIs (like Microsoft Graph).

Conclusion

OData filtering is a powerful feature in SharePoint that enables developers to query data efficiently and effectively. By understanding the available data types and operations in OData filtering, SharePoint developers can create efficient, precise queries that enhance application performance. Mastering these tools ensures you can build scalable, robust solutions while minimizing unnecessary data transfer and processing.

Whether you’re developing workflows, dashboards, or user-driven interfaces, leveraging OData filtering is essential for making your SharePoint applications smarter and more responsive.

Have questions about OData filtering or need help with your SharePoint project? Drop your thoughts in the comments or reach out. I’d love to hear from you!

Manage script's configurations with hydra

Alex Mboutchouang — Sun, 06 Oct 2024 06:00:07 GMT

Two years ago when while working on my Master's project which consisted in training an AI model for textual information retrieval, I had to train my models with different parameters to analyse their behaviour. At first, I had to change the values directly in my code which was very tedious so I decided to manage these different parameters with the standard input using the argparse module that comes by default with python. With the increasing number of parameters, it was really getting complicated and that's when I discovered HYDRA and it definitely allowed me to move forward more quickly and complete my experiments. So, I decided to write this brief article to introduce you to HYDRA and how it works. I hope it will be helpful to others, as it was for me.

What is HYDRA

HYDRA is a powerful open-source tool developed by Facebook's researchers to facilitate dynamic configuration creation. Hydra defines configurations from YAML files and this configuration can be modified with standard parameters from the CLI. The key features of HYDRA include:

Hierarchical configuration is composable from multiple sources.
Configuration can be specified or overridden from the command line.
Dynamic command line tab completion.
Run your application locally or launch it to run remotely.
Run multiple jobs with different arguments with a single command.

How do HYDRA work?

Installation

HYDRA is a Python package and can therefore be installed from the Package Index using the following command.

pip install hydra-core

Managing configuration with HYDRA

To use HYDRA to manage configurations, the first step is to create our configuration as a YAML file. The following is an example of a configuration file.

db:
  driver: postgres
  database: database_name
  user: username
  pass: password

social:
  google:
    client_id: 
    client_secret:

This configuration can be used in a Python module as follows:

import hydra
from omegaconf import DictConfig, OmegaConf
from database import DBDriver
from socialauth import GoogleAuth

@hydra.main(version_base=None, config_path=".", config_name="conf")
def main(configs : DictConfig) -> None:
    # db = DBDriver(configs.db)
    # google_provider = GoogleAuth(configs.social.google)
    print(OmegaConf.to_yaml(cfg))

if __name__ == "__main__":
    main()

If you run this script with python main.py without providing any command line parameters, you will get the following parameters:

python main.py
{'driver': 'postgres', 'database': 'database_name', 'user': 'username', 'pass': 'password'}
{'client_id': '', 'client_secret': ''}
db:
  driver: postgres
  database: database_name
  user: username
  pass: password

social:
  google:
    client_id: 
    client_secret:

However, due to the power and flexibility of HYDRA, the values of these parameters can be modified directly during the execution of the program. The following example shows the result of the execution of the program by modifying the parameters:

python main.py db.driver=mysql db.database=example social.google.client_id=a4a4a4a4 social.google_secret=f5f5f5f5f5f5
{'driver': 'mysql', 'database': 'example', 'user': 'username', 'pass': 'password'}
{'client_id': 'a4a4a4a4', 'client_secret': 'f5f5f5f5f5f5'}
db:
  driver: mysql
  database: example
  user: username
  pass: password

social:
  google:
    client_id: a4a4a4a4
    client_secret: f5f5f5f5f5f5f

It is important to note that, the values written in the configuration file are just the default values. It is also possible to define parameters without default values. In this case, the user would have to pass the values at runtime as we do for functions in programming. For such a configuration, the default value is ??? as we can see in the following:

db:
  driver: postgres
  database: database_name
  user: username
  pass: password

social:
  google:
    client_id: ???
    client_secret: ???

If we run our program without passing values for the parameters social.google.client_id and social.google.client_secret we will get the following error:

python main.py
{'driver': 'postgres', 'database': 'database_name', 'user': 'username', 'pass': 'password'}
Error executing job with overrides: []
Traceback (most recent call last):
  File "/home/username/github/tutorials/hydra/main.py", line 9, in main
    google_provider = GoogleAuth(**configs.social.google)
omegaconf.errors.MissingMandatoryValue: Missing mandatory value: social.google.client_id
    full_key: social.google.client_id
    object_type=dict

Conclusion

Use Data classes to write your Python Classes quickly

Alex Mboutchouang — Mon, 30 Sep 2024 06:00:41 GMT

The Python programming language is world-renowned for its simplicity. As Python developers, We strive to write code that is elegant, concise, and easy to understand. Yet, when defining classes to hold simple data structures, we often find ourselves drowning in boilerplate code, writing the same standard methods each time. Luckily, python also offers a powerful module that lets us automatically add certain attributes and methods(__init__, __repr__) to our classes. This is the dataclasses module. In this article, we explore one of Python's most powerful modules for simplifying our code: dataclasses. Whether you're a seasoned Pythonista or just getting started, understanding how to leverage data classes can significantly enhance your productivity and the clarity of your code. Join me as we delve into the world of data classes, uncovering their secrets, learning how to wield their power, and unleashing the full potential of Python's object-oriented capabilities. By the end of this article, you'll be equipped with the basic knowledge and skills to wield data classes. Let's dive in!

Basic Usage

Let us start with an example. let's say we are developing a 3d Game and we want a class that represents a point in a 3d dimension. The class will look like the following.

class Point3d(object):
    def __init__(self, x: int, y: int, z: int):
        self.x = x
        self.y = y
        self.z = z

    def __repr__(self):
        return f"Point(x={self.x}, y={self.y}, z={self.z})"

    def __eq__(self, point):
        return self.x == point.x and self.y == point.y and self.z == point.z

point1 = Point3d(1, 2, 8)
point2 = Point3d(3, 4, 5)

print(point1)
print(point2)

print(point1 == point2) # False
print(point1 == Point3d(1, 2, 8)) # True

The dataclasses modules offer a streamlined approach to defining classes whose primary purpose is to store data. With just a few lines of code, you can create a data class that automatically generates common methods like init(), repr(), eq(), and more. Writing our previous class using the dataclasses module will look like the following:

from dataclasses import dataclass

@dataclass
class Point3d:
    x: int
    y: int
    z: int

point1 = Point3d(1, 2, 8)
point2 = Point3d(3, 4, 5)

print(point1)
print(point2)

print(point1 == point2) # False
print(point1 == Point3d(1, 2, 8)) # True

In this example, we the decorator @dataclass of dataclasses to instruct Python to automatically generate special methods for our class Point3d. This allows developers to save a massive amount of time and focus on the real logic of the program rather than losing time with boilerplate class definitions.

Customizing Data classes

While dataclasses provide convenient default behaviour, Python's flexibility allows us to customize their functionality to suit our specific needs. Whether it's setting default values, specifying ordering, or controlling mutability, data classes offer a range of customization options. Let's explore some of these customization features.

Setting Default Values

The dataclasses module allows us to specify default values for attributes. This makes it possible to create instances without providing values for all attributes.

from dataclasses import dataclass

@dataclass
class Point3d:
    x: int = 0
    y: int = 0
    z: int = 0

point1 = Point3d()
point2 = Point3d(3, 4, 5)
print(point1 == point2) # False
print(point1 == Point3d(0, 0, 0)) # True

In this example, we set all attributes to have a default value of 0. When creating a Point3d instance, if an attribute is not provided, it defaults to 0.

Specifying Ordering

With the @dataclass decorator, we can specify the order in which attributes are compared and sorted using the order parameter.

from dataclasses import dataclass

@dataclass(order=True)
class Product:
    name: str
    price: float

product1 = Product("Laptop", 999.99)
product2 = Product("Smartphone", 699.99)

print(product1 < product2)  # Output: False

In this example, the order=True parameter specifies that instances of the Product class should be orderable based on their attributes. By default, instances are compared based on the lexicographic order of their attributes.

Controlling Mutability

You can make attributes of a data class immutable by setting the frozen parameter to True in the @dataclass decorator.

from dataclasses import dataclass

@dataclass(frozen=True)
class Point3d:
    x: int
    y: int
    z: int

point = Point3d(1, 2, 3)
point.x = 4  # AttributeError: can't set attribute

In this example, the Point class is immutable, meaning once an instance is created, its attributes cannot be modified.

Traceback (most recent call last):
  File "/home/username/github/blog/point3d.py", line 10, in 
    point.x = 4  # AttributeError: can't set attribute
    ^^^^^^^
  File "", line 4, in __setattr__
dataclasses.FrozenInstanceError: cannot assign to field 'x'

By customizing dataclasses, you can change their behaviour to match your requirements, whether it's setting default values, controlling the order, or ensuring immutability. These customization options enhance the flexibility and power of dataclasses in Python.

Inheritance and Data classes

One of the most important features of the OOP(object-oriented programming) is the inheritance. Inheritance allows classes to inherit attributes and methods from parent classes. When it comes to dataclasses, inheritance works seamlessly, allowing you to create child classes that inherit attributes and behaviours from their parent dataclasses. The following code snippet shows an Example of classes with inheritance using dataclasses.

from dataclasses import dataclass

@dataclass
class Animal:
    name: str
    sound: str

@dataclass
class Cat(Animal):
    breed: str
    num_legs: int = 4

@dataclass
class Dog(Animal):
    breed: str
    num_legs: int

dog = Dog(name="Buddy", sound="Woof", breed="Labrador",  num_legs=4)
cat = Cat(name="Misty", sound="Meow", breed="Siamese")

print(dog) # Dog(name='Buddy', sound='Woof', breed='Labrador', num_legs=4)
print(cat) # Cat(name='Misty', sound='Meow', breed='Siamese', num_legs=4)

In this example, we first create a class that represents an Animal with 2 attributes(name, sound). Then we create 2 more classes(Dogand Cat). Each of these classes defines 2 more attributes(breed, num_legs). This also works well with Class method.

from dataclasses import dataclass

@dataclass
class Vehicle:
    brand: str

    def honk(self):
        return "Beep Beep!"

@dataclass
class Car(Vehicle):
    model: str

    def honk(self):
        return "HONK!"

car = Car(brand="Toyota", model="Camry")

print(car.honk())  # Output: HONK!

In this example, the Car class overrides the honk() method inherited from its parent class Vehicle to provide a different honking sound.

Understanding how inheritance works with dataclasses allows you to build hierarchies of classes that share common attributes and behaviours while still allowing for customization and specialization in subclasses.

Performance Considerations

One important thing to be aware of when using the dataclasses module in Python is their performance characteristics, especially in performance-critical applications. Although data classes offer many benefits in terms of readability and simplicity, they may introduce some overhead compared to traditional classes. The following points are some important aspects of how the dataclasse module uses memory.

Memory Overhead

Each data class instance consumes memory to store its attributes and any additional overhead introduced by Python's runtime. While this overhead is usually minimal, it can become a concern when dealing with large numbers of instances or when memory usage is a critical factor.

Attribute Access Overhead

Dataclasses rely on Python's attribute access mechanisms, which may introduce some overhead compared to accessing attributes directly in a traditional class. While this overhead is typically negligible for most applications, it can become a consideration in performance-sensitive code.

Initialization Overhead

Data classes automatically generate an init() method to initialize instances, which involves calling Python's object creation mechanisms. While this initialization overhead is generally small, it may become noticeable in applications that create large numbers of instances frequently.

Comparison Overhead

Data classes automatically generate eq() and other comparison methods, which involve comparing the attributes of instances. While this overhead is usually minimal, it may become significant in applications that perform a large number of comparisons.

Serialization Overhead

Data classes provide a convenient way to serialize instances to JSON, dictionaries, or other formats. However, this serialization process incurs overhead compared to directly accessing and manipulating the attributes of instances.

Conclusion

Data classes in Python offer an amazing approach to defining classes for storing data, reducing boilerplate code and improving code readability. By leveraging automatic method generation and customization options, developers can focus on solving problems rather than wrestling with class definitions.

While data classes provide many benefits, it's important to consider potential performance overhead in performance-critical applications. By understanding the trade-offs and making informed decisions, developers can effectively harness the power of data classes to build robust and maintainable Python codebases.

As you explore data classes further, experiment with different use cases, and discover new ways to leverage their power. Thanks for reading and see you soon for a new article.

Reference

https://docs.python.org/3/library/dataclasses.html

Python structural pattern matching

Alex Mboutchouang — Mon, 27 May 2024 06:00:30 GMT

In the list of new features when version 3.10 of Python was announced, structural pattern matching was one of the most innovative features for me. While most people describe that as the Switch...case Pattern that we know in other popular programming languages like PHP or Javascript, structural pattern matching offers much more, as we will explore in this blog post. It provides a powerful mechanism for matching complex data structures and integrating seamlessly with Python's existing features.

Pattern Matching

As described in this amazing Wikipedia article, pattern matching in computer science is checking a given sequence of tokens for the presence of the constituents of some pattern. The pattern here can be a string or any other data. Pattern matching is a good alternative to conditional statements, resulting in cleaner, more readable code. It simplifies complex conditional logic and makes the code more self-explanatory.

Python's Structural Pattern Matching

In Python, match statement introduces structural pattern matching, providing a more concise and expressive way to handle conditional logic. Unlike traditional constructs like if-elif-else and switch-case(in other programming languages), which rely on sequential evaluation or equality checks, match allows a direct matching of patterns against values, enabling more robust and readable code.

Syntax and Basic Usage

The match statement in Python >= 3.10 allows for concise conditional branching by matching a given value against patterns defined in case clauses. It sequentially evaluates each pattern and executes the corresponding action for the first matching pattern encountered. Patterns are evaluated from top to bottom, with the _ wildcard pattern serving as a catch-all for unmatched values. Guards, specified using the if keyword, enable additional conditions to be applied to patterns.

def classify_value(value: int):
    match value:
        case 0:
            print("Zero")
        case n if n > 0:
            print("Positive")
        case n if n < 0:
            print("Negative")

check_sign(5)   # Output: Positive
check_sign(-3)  # Output: Negative
check_sign(0)   # Output: Zero

In this example, we define a function classify_value that takes an integer value as input and uses the match statement to match it against different literal values. Depending on the value of the input, it prints the corresponding classification.

It also works very well with strings as we can see in the following example.

def check_string(value: str):
    match value:
        case "apple":
            print("It's an apple")
        case "banana":
            print("It's a banana")
        case _:
            print("It's something else")

check_string("apple")   # Output: It's an apple
check_string("banana")  # Output: It's a banana
check_string("orange")  # Output: It's something else

In this example, the check_string function takes a string value as input and uses the match statement to match it against different patterns defined in the case clauses. If the input string matches one of the specified patterns ("apple" or "banana"), the corresponding action is executed. Otherwise, the _ wildcard pattern catches any unmatched values, and the default action is executed, indicating that it's something else.

Advanced Patterns

Pattern matching does not only work with basic types in Python, it can also be used with more complex types like lists, dict or tuples. Here are some examples.

from typing import List

def match_list(lst: List[int]):
    match lst:
        case [1, 2, _]:
            print("The first two elements are 1 and 2")
        case [x, y, z]:
            print(f"The list exacly 3 elements which are: {x}, {y}, {z}")
        case _:
            print("List does not match")

match_list([1, 2, 3])       # Output: The first two elements are 1 and 2
match_list([4, 5, 6])       # Output: The list exacly 3 elements which are: 4, 5, 6
match_list([7, 8, 9, 10])   # Output: First three elements are 7, 8, 9

In the example, the match_list function takes a string value as input and uses the match statement to match it against different patterns defined in the case clauses. If the input string matches one of the specified patterns

The First pattern in this case checks if the first two elements the the list are 1 and 2.
The Second pattern checks if the list has exactly 3 elements.
If none of these cases is matched, the _ wildcard pattern will be triggered and the default action is executed, indicating that it's something else.

def match_tuple(tup):
    match tup:
        case (0, _):
            print("Tuple starts with zero")
        case (_, "hello"):
            print("Tuple contains 'hello'")
        case _:
            print("Tuple does not match")

match_tuple((0, "world"))    # Output: Tuple starts with zero
match_tuple((42, "hello"))   # Output: Tuple contains 'hello'
match_tuple((10, "bye"))     # Output: Tuple does not match

This example shows how match can be used with a tuple. The first pattern checks if the first element of the tuple is 0 and the second pattern checks if the last element in the tuple is the string hello.

Pattern matching with Custom Classes

As Python developers, we write our own custom classes in most projects we work on. Another interesting aspect of match is how it works with custom classes. Basically, custom classes can define the __match_args__ attributes, which should be a tuple that defines what attributes on a class instance get used in the case expression of a match statement.

class Point:
    __match_args__ = ('x', 'y')
    def __init__(self, x, y):
        self.x = x
        self.y = y

def match_instance(obj):
    match obj:
        case Point(0, 0):
            print("Origin Point")
        case Point(x, y):
            print(f"Point at ({x}, {y})")
        case _:
            print("Not a Point instance")

point1 = Point(0, 0)
point2 = Point(1, 1)
point3 = Point(2, 2)

match_instance(point1)   # Output: Origin Point
match_instance(point2)   # Output: Point at (1, 1)
match_instance(point3)   # Output: Point at (2, 2)
match_instance("Hello")  # Output: Not a Point instance

In this example, we define a class Point with two attributes(x and y). Additionally, we also define __match_args__ with both attributes meaning both will be used in the match expression. The match_instance function then takes an object as parameters and uses it to initiate the match pattern with the following cases:

Point(0, 0): This pattern checks if the provided object is the origin(Point with x=0 and y = 0).
Point(x, y): This pattern checks if the provided object is a valid Point but different from the origin(Point with x != 0 and y != 0).
If none of these cases is matched, the _ wildcard pattern will be triggered and the default action is executed, indicating that it's something else.

Error Handling with Pattern Matching

Another important application of structural pattern matching in Python is error handling. it simplifies error handling and makes them more efficient, as we can see in the following use case.

class CustomErrorType1(Exception):
    def __init__(self, message):
        self.message = message

class CustomErrorType2(Exception):
    def __init__(self, message):
        self.message = message


def process_data(data):
    if data.get("status") == "success":
        result = data.get("result")
        print("Data processing successful. Result:", result)
    elif data.get("status") == "error":
        error = data.get("error")
        if isinstance(error, CustomErrorType1):
            print("Error of Type1 occurred during data processing:", error.message)
        elif isinstance(error, CustomErrorType2):
            print("Error of Type2 occurred during data processing:", error.message)
        else:
            print("Unknown error occurred during data processing")
    else:
        print("Invalid data format")

# Example usage
data1 = {"status": "success", "result": 42}
data2 = {"status": "error", "error": CustomErrorType1("Data not found")}
data3 = {"status": "error", "error": CustomErrorType2("unauthorized access")}
data4 = {"status": "invalid"}

process_data(data1)  # Output: Data processing successful. Result: 42
process_data(data2)  # Output: Error of Type1 occurred during data processing: Data not found
process_data(data3)  # Output: Error of Type2 occurred during data processing: unauthorized access
process_data(data4)  # Output: Invalid data format

This approach provides a more elegant and readable way to handle different cases, making the code more expressive and reducing the need for nested if-else blocks or cumbersome try-except constructs.

Best Practices and Tips

One of the common questions that came out about structural pattern matching in Python is the question of knowing when to use them over the traditional control flow constructs like if-elif-else. Here are some points to consider when deciding which one to use.

Structured Data: Pattern matching excels when working with structured data, such as dictionaries, tuples, or custom data types, where different patterns can be matched against specific attributes or values.
Multiple Conditions: If you have multiple conditions to check and handle based on the structure of data, pattern matching can provide a more concise and readable solution compared to nested if-elif-else blocks.
Error Handling: Pattern matching can be particularly useful for error handling, especially when dealing with custom exceptions or complex error scenarios. It allows elegantly handling different error or exception types and associated actions.
Simple Conditions: For simple conditional checks where you're only comparing values or evaluating boolean expressions, traditional control flow constructs like if-elif-else may be more straightforward and familiar.
Readability: In some cases, using if-elif-else statements may lead to more readable code, especially when the logic is straightforward and doesn't involve complex patterns or data structures.
Legacy Codebases: If you're working on a codebase that doesn't yet support Python 3.10 or where developers are not familiar with pattern matching, sticking with traditional control flow constructs may be more appropriate to maintain consistency and readability.

Conclusion

In conclusion, structural pattern matching in Python offers numerous benefits, including code readability, simplified conditional logic, and improved error handling. By providing a more concise and expressive syntax for matching patterns within data structures, Python's pattern-matching feature empowers developers to write cleaner and more maintainable code. I encourage you to explore and experiment with this powerful new feature in your own projects, leveraging its capabilities to streamline their code and unlock new possibilities. To delve deeper into structural pattern matching and its applications, I recommend consulting the official Python documentation and exploring additional resources available online.

references

Understanding the Differences Between Database, Data Warehouse, and Data Lake

Alex Mboutchouang — Sun, 28 Apr 2024 17:07:48 GMT

In today's digital landscape, the phrase "data is the new oil" resonates more than ever, underscoring the pivotal role that data plays in shaping our modern world. As our lives become increasingly intertwined with technology, decisions across virtually every facet of life are informed by data. It's no wonder then, that organizations are pouring significant resources into the collection, storage, processing, and analysis of data.

Enter the concepts of databases, data warehouses, and data lakes – the cornerstone of modern data management. These entities form the backbone of organizations' efforts to harness the power of data, enabling them to glean insights and drive informed decision-making.

But what exactly do these terms entail, and why are they essential in today's data-driven era? Join us as we delve into the intricacies of databases, data warehouses, and data lakes, unravelling their roles and uncovering the key considerations that underpin their utilization in the ever-evolving landscape of data management.

Databases

A database is a collection of data that is organized and stored for easy access, retrieval, and management. It typically uses a schema to define the structure of the data and supports operations like querying, updating, and deleting.

Key Features

Provides efficient storage and retrieval of data.
Supports transaction processing, ensuring data integrity and consistency.
Allows for concurrent access by multiple users.
Suitable for applications requiring real-time data access and updates.

Data Warehouses

A data warehouse is an art of centralized repository that stores structured and organized data from one or multiple sources. It is optimized for querying and analysis, typically using Online Analytical Processing (OLAP) tools, and is designed to support decision-making processes.

Key features

Integrates data from various sources, providing a unified view for analysis.
Supports complex queries and analytics to uncover insights and trends.
Provides historical data for trend analysis and reporting.
Enhances data quality through data cleaning and transformation processes.

Data lake

A data lake is a centralized repository that stores vast amounts of raw, unstructured, and semi-structured data in its native format. It allows organizations to store data without the need for a predefined schema, enabling flexible processing and analysis.

Key features

Accommodates diverse data types and formats, including text, images, videos, and sensor data.
Enables data exploration and discovery without upfront schema design.
Supports advanced analytics, including machine learning and big data processing.
Scales easily handle large volumes of data, including streaming data sources.

How do they differ?

Databases, data warehouses, and data lakes differ in structure, use cases, and handling of data types. Databases are either structured or semi-structured (no SQL database) and rely on predefined schemas, ideal for transaction processing and real-time data access. Data warehouses, require structured data, excel in analytics, reporting, and decision support, often integrating diverse data sources. Conversely, data lakes store raw data without predefined schemas, facilitating exploratory analysis, big data processing, and storage of structured, unstructured, and semi-structured data, offering unparalleled flexibility in data handling.

Databases

databases are best suited to being used to store operational data because they are efficient for storage and retrieval.

Data warehouses

In the scenario where You require to integrate data from one or many sources for analytics, reporting, and decision-making, data warehouses are more suitable.

Data lakes

Data Lakes are best suited to being used You need to store and analyze large volumes of diverse data types, including unstructured and semi-structured data, and when you want to perform exploratory analysis or advanced analytics.

Conclusion

In conclusion, databases, data warehouses, and data lakes each offer unique strengths and cater to distinct use cases within the realm of data management. By comprehending the advantages and limitations of these systems, organizations can make informed decisions when devising their data management strategies. Whether it's the real-time processing capabilities of databases, the analytical prowess of data warehouses, or the flexibility and scalability of data lakes, a nuanced understanding empowers organizations to leverage the right tools for their specific needs, ultimately driving efficiency, innovation, and success in the ever-evolving landscape of data-driven decision-making. I hope this article has provided you with a brief description of what databases, data warehouses and data lakes are and when to use them. Thanks for reading and see you soon for a new article.

Alex Mboutchouang

Understanding OData Filtering: A Comprehensive Guide

The Lists Used in Our Examples

Users List

Projects List

Tasks List

Relationships Between the Lists

What is OData?

How Does OData Filtering Work in SharePoint?

Available Data Types in OData Filtering

String

Number

Boolean

DateTime

Lookup Fields

Choice and Managed Metadata

GUID

Possible Operations in OData Filtering

Comparison Operators

Logical Operators

String Functions

Arithmetic Operators

Date Functions

Null Checks

Lookup and Navigation

Collections

Benefits of Using OData Filtering

Drawbacks of OData Filtering

When to Use OData Filtering

Conclusion

Manage script's configurations with hydra

What is HYDRA

How do HYDRA work?

Installation

Managing configuration with HYDRA

Conclusion

Use Data classes to write your Python Classes quickly

Basic Usage

Customizing Data classes

Setting Default Values

Specifying Ordering

Controlling Mutability

Inheritance and Data classes

Performance Considerations

Memory Overhead

Attribute Access Overhead

Initialization Overhead

Comparison Overhead

Serialization Overhead

Conclusion

Reference

Python structural pattern matching

Pattern Matching

Python's Structural Pattern Matching

Syntax and Basic Usage

Advanced Patterns

Pattern matching with Custom Classes

Error Handling with Pattern Matching

Best Practices and Tips

Conclusion

references

Understanding the Differences Between Database, Data Warehouse, and Data Lake

Databases

Key Features

Data Warehouses

Key features

Data lake

Key features

How do they differ?

Databases

Data warehouses

Data lakes

Conclusion