Welcome to EnDavid.com. You can find here a compendium of things that I have published and other stuff I made during my spare time.

If you get lost, try visiting the Site Map.

In this main page, you can find my main blog, where I keep track of the updates of this site, and post some technical articles from time to time. If you are interested, just subscribe to the RSS feed.


Can Generative AI create original art?
Sun, 10 Nov 2024 15:21:27 +0000
A robot drawing a posing robot on a canvas
A robot drawing a posing robot on a canvas, drawn with DALL-E.

GenAI as a collage of mental percepts

During my PhD I developed an app called Sketch-to-Collage. It was a tool to create collages using simple color strokes as input. It used image retrieval to extract blobs from an image dataset and combine them in a new image that resembled the input sketch. See the example below, and this video: Sketch-to-Collage (Siggraph 2007 Posters).

Sketch-to-collage example
The sketch on the left is used to retrieve the images in the middle and create the collage on the right.

Because I had to write scientific papers, I needed to come up with a method to evaluate the quality of the techniques proposed. In image retrieval, a common strategy is measuring recall and precision. With that in mind, I used the concept of “mental percepts”, a mental representation of the perceived world state. The sketch from the user represents a draft of their visual mental percepts and I can use the sketch to measure the recall and precision of my app. It’s a bit more complex than that, but that’s the main idea. You can read my PhD thesis here: “Region-based image abstraction for retrieval and synthesis” [Tokyo Tech, 2008] (PDF).

Representation of mental percepts
Visual Perception model

To me, Generative AI, or GenAI for short, sounds like a very sophisticated version of such a tool. Your mental percept can be represented by words now, if the input is a text prompt. But you can even use images as input, or combine several inputs, or even combine several tools, i.e. you could create a collage created with my tool as an input to create a better image from it.

Sketch to collage to anime
A sketch turned into a collage, and then transformed with AI Mirror app into anime style.

Originality and copyright

Can I collage anything and stay original? We need to check the images used in the original data set. If the dataset contains a copyrighted drawing of an apple and I use it in my collage, I may be infringing copyright, even if one could argue that the collage itself is an original piece of work.

But what if I simply sample the color of the apple to paint another fruit? Are colors copyrighted? And what if I copy the brush stroke lines as well? Perhaps you see where I’m going: with GenAI is sometimes hard to find traces of any original images because the final result is a “collage” of very small image features.

Another thing to take into account when thinking of originality is the limit of what we can create. As with my “sketch-to-collage” app, GenAI depends on an initial dataset for training, so there needs to be an a-priori match between your mental percepts and what’s already there. That means that it won’t be able to draw a cat if it’s never seen one.

What is art?

We doubted the originality of GenAI, but there’s a more fundamental question to answer: Can GenAI produce art at all?

The Wikipedia says that there’s no generally agreed definition of what constitutes art, but it gives this one:

Art is a diverse range of human activity and its resulting product that involves creative or imaginative talent, generally expressive of technical proficiency, beauty, emotional power, or conceptual ideas.

The word “human” suggests that it must be made by humans to be art. I also highlighted the word “talent”, because I think many will agree that we need to appreciate some skill that excels in some way.

Here’s my attempt to give my own definition, based on the above, but rewritten so it’s easier to evaluate whether something is art or not:

Art is any piece of work created by combining mental percepts into something to be consumed by human senses, using a set of sophisticated skills. It evokes some emotion in the consumer because of its quality or originality.

Let us discuss my definition over some examples (please leave comments on each section on Medium if you want to discuss).

A banana duct-taped to a wall

A banana duct-taped to a wall
“Comedian”, a banana duct-taped to a wall, by Maurizio Cattelan. Antonio García Villarán's book on “hamparte”, to the right.

❌🙅‍♂️ This doesn’t match my definition of art because it doesn’t require any sophisticated skills or talent, even though this is sold as art in art galleries.

Spanish Artist and communicator Antonio García Villarán tagged that piece as “hamparte”, a term he popularized to define art fiascos. “Hamparte” is a portmanteau of the words “hampa” (group of rascals, scammers, or con men) and “arte” (art). Since the word “con art” already exists in English with a different meaning, let me create a new portmanteau for “hamparte” in English with the words “fiasco” and “art”: fiascart.

A pianist performance of “Clair de Lune”

✅🙆‍♂️ This matches my definition of art, because you need skills to play the piano and it evokes emotions to people who listen to it. It may not be original, but it has quality.

This art is co-created by 2 people: the composer and the performer. The composer puts down his mental percepts in paper in the form of music notation. But it may not fully represent how he feels it should be played. They may write some “prompts” in the score to express if it needs to be played softer, or “virtuoso”. But in the end it’s the performer’s own mental model and interpretation what we hear.

A photograph taken with an iPhone

4 photographs from endavidg Instagram
4 photographs I took with my iPhone, from my instagram account (endavidg)

❌🙅‍♂️ There are no skills needed for this, so I would argue it’s not art. That doesn’t mean the picture is not pretty. Perhaps you have travelled far to take a picture of the Northern Lights, the Aurora Borealis. But the travel is not art and the picture that you take with your phone isn’t art either.

What we can derive from this is that not all pretty pictures are art. Instagram is mostly not art. These pictures can evoke emotions as well, and they can be of high quality. They can even be original, if it captures some rare event. But if they don’t require skills, I wouldn’t consider it art.

One could argue that the engineers that created the iPhone were the ones who had the skills. But there was not direct intentionality in the engineers to take that particular photo that you’ve taken. The skills should be needed in the person who directly creates the artwork.

A painting by Miró

3 paintings by Miro
Some paintings by Miró. From left to right: Peinture sur fond blanc pour la cellule d'un solitaire, Poem III, Woman dreaming of escape.

❌🙅‍♂️ This may be controversial, but Antonio García Villarán considers Miró’s work “hamparte” —or “fiascart”, as I defined earlier. I do agree it’s not art. But it’s not the same as a banana on a wall either, so I wouldn’t classify it as “fiascart”. I think Miró’s drawings are designs. A design is created with artistic tools, but it doesn’t need to be art.

For instance, a straight line on a white T-shirt is a design. But it doesn’t require any special skills to draw such line. Similarly, if you can’t tell apart Miró’s drawings from those from a 5-year old is because what you see is not art. That doesn’t mean the design doesn’t evoke any emotions or that it’s not valuable. But as with the case with the iPhone photograph, it’s just an interesting picture.

A drawing made with Photoshop

A dragon drawn on paper, and with Procreate
A dragon I drew on paper, and then with Procreate on iPad

🤔 This depends on how much talent is needed to create that drawing. If anyone can do it with a few clicks, it’s not art. Any digital tools make our lives much easier. For instance, I use Procreate on iPad to draw clean brush strokes that I wouldn't be able to reproduce with a real brush. Above there’s a dragon I’ve drawn with ink and color pencils, and then there's the same drawing I made with Procreate. You can still recognize my style in the Procreate drawing, but the finish may look a bit neater. Neither of them can be considered art, I suppose, because I’m not talented enough to create something of good quality. They can be categorized as “artistic expression”, or simply as cute drawings.

Andy Warhol was a pioneer in digital drawing. He created some iconic drawings with Deluxe Paint in the Amiga computer. Drawing with the mouse wasn’t easier than drawing with paper at the time, though. There were many talented pixel-art artists at the time that spent a lot of time and effort in their creations.

A GenAI image from a text prompt

❌🙅‍♂️ I would argue that a GenAI image from a text prompt is not art. Even if the prompt is very long and complicated, I wouldn’t consider that an artistic skill. That’s why people that master the current prompt inputs are called “prompt engineers”, and not “prompt artists”. Also note that AI is getting simpler to use, so there may no longer be any need for “engineering” either.

Some people may confuse originality with art. Even if we assume that GenAI can be original, originality alone does not define art. If there’s no talent involved, a GenAI image is just another pretty image. But this time around is created from other means, other than an iPhone camera.

A drawing improved with GenAI

3 drawings of Lunafreya
Left: my attempt to draw a fan art of Lunafreya with starscourge, from Final Fantasy XV; middle: drawing improved with AI Mirror app; right: AI Mirror being more inventive.

🤔 There are two things to check here: the drawing and the transform. If the drawing could be considered art by itself, and the transform doesn’t make the original drawing unrecognizable, I would say it can be considered art. If we accepted a Photoshop drawing by a talented artist as art, why not in this case? The artist could be applying premade textures with Photoshop on their drawing, and it would still be art.

Another way to think of it is that we allow certain level of “cheating”, i.e. the use of certain good tools. If I were talented and the drawing I made with Procreate (left in the figure above) could be considered art, the image generated with GenAI in the middle may still be art because the original drawing is still recognizable. However, the image on the right is a completely different thing. It does represent the same mental percepts I had in mind, that is, a woman with blue hair and black tears, but it doesn’t require much talent to obtain good quality like that. In this example, it completely hides my lack of talent, so it should smell suspicious.

A GenAI song from an original poem

🤔 This is somewhat similar to creating a GenAI image from an original drawing. But there’s a difference: there’s no music at all to begin with. If the poem can be considered art, only the poem will be art, but not the song. I think this is more similar to taking a photograph of the Mona Lisa. That photograph contains a piece of art, but the photograph itself is not art.

(The video above is a poem I wrote in 2009, “Sonnet pretensions of Pluto”, now turned into a song with Suno AI.)

A GenAI song from an original song

🤔 This is close to the case of creating a GenAI image from an original drawing. It works the same way: if the original song can be considered art, then the result will also be art if the original can still be recognized in it. The linked YouTube video above is a song I created with Suno AI by making a cover of my own voice. It produces something of decent quality from something that sounds awful (if you are not afraid, here's my embarrassing original voice: “Sonnet pretensions of Pluto”, improvised version by myself). Again, it is not art because it completely hides my lack of talent. It’s just a nice song, in the same way we had nice GenAI drawings, nice iPhone photographs, and interesting designs.

A drawing from a GenAI image

Drawings of seals using GenAI as reference
GenAI images generated with DALL-E for reference, and the drawings I made for Inktober 2024

✅🙆‍♂️ If the drawing is made by a talented artist, I would consider that art (not the case of my drawings of seals I shared above for illustration purposes). If we accept that searching for reference images on Internet is not cheating, then writing a text prompt to generate an image for inspiration should be OK as well. The worry is not whether it is art or not, but whether that art is original. For instance, in the extreme case that the learning dataset contains just images of Van Gogh sunflowers, you’d be copying a Van Gogh drawing.

In music, the equivalent would be a band creating a song with AI and then playing it live. The performance may be art, but there would be a worry of whether that song is original or not.

So let’s go back to the subject of originality.

Can humans be original?

After that long discussion of what art is, I think we can conclude that there are many pretty or interesting creations, either created by humans or by GenAI, that are not necessarily art. Another takeaway is that art doesn’t need to be original. A good reproduction of Las Meninas or a good performance of Gymnopédie can be considered art.

Even if GenAI is not art, it may have some value, though. And that value may be greater if it represents something original. I discussed this a bit earlier when I talked about originality and the limits of what GenAI can create, depending on the dataset used for learning. But are humans any different? Do we have any limits?

For me, the creative process starts as “collaging” in the brain, that’s why I talked about collages in the introduction. We copy and paste mental percepts in our brains and then we try to put the result down in paper. When drawing, we repeat this “copy-pasting” by copying familiar shapes, or familiar drawing styles. GenAI just reveals the human tricks that most mortals use. Only a few individuals can be truly inventive. For instance, in pop music Björk and Rosalia come to mind. Everyone else is mostly repeating the same chords, using the same instruments, and repeating the same patterns.

There may even be a limit to originality. Anime faces may all look the same because it’s hard to create original faces and still have the distinctive anime look. And even something particular to a person, like their voice, may not be original. Some of the songs I created with Suno AI have familiar voices. This song I made from a Spanish poem I wrote 20 years ago, “Si así debiera ser en el mar”, sounds like Spanish singer Anna Torroja. And this other one, “Word Wormhole”, something I wrote for the first chapter of my attempt to write a sci-fi book (“Surrealitales”), sounds to me as if sung by David Bowie. Are those singers in the dataset, or is it just that other singers have a similar tone and the “collaging” of voices end up sounding familiar? Since the AI is a black box, it’s hard to know what went into creating the final result.

I think it is very hard to be truly original. That’s why I put the “or” clause in my definition of art. But originality is tangential to talent, i.e. it’s another dimension. That means that we can create something original, interesting by itself, and GenAI can help us improve its quality.

Saturation of the senses and mediocrity rise

Since I work in Computer Graphics, I studied a bit how cameras work, in terms of lenses, sensors, and light interactions. However, I never dug deeper into photography because I became saturated with digital photographs early on. When I moved to Tokyo in 2002, all phones in Japan had cameras already. They weren’t great at the time, just 240×320 pixels on my first mobile phone. But it already meant that tons of photographs were shared continuously. It didn’t help that I focused my research on creating a database of such photographs and worked on image retrieval. And then when Facebook, and then later Instagram, appeared, we got bombarded with photographs from all our family and friends.

At the beginning those photographs were not of very good quality. But now with smartphones with intelligent processing that create beautiful moon pictures that are more fake than real, your Instagram feed looks probably amazing. It doesn’t help that phones tend to saturate colors to make images more appealing. In the end our visual cortex receives too many stimuli and we may be too tired to appreciate real art.

Let pause for a bit and think of what we've discussed earlier: most of what we see is probably not art. It’s just pretty pictures. But since everyone can create pretty pictures now, we may become immune to beauty. GenAI exacerbates the problem because now it’s even easier for everyone to create pretty pictures.

And it’s not only pretty pictures. We can create all kinds of appealing content. Now we can write like a professional novelist, or create catchy songs with just a few clicks. I’ve been pretty excited myself, specially with music. It’s so cool to be able to listen to my poems turn into songs. But when I hear a silly song about eating cookies all day made by a 6-yo, I get a bit down because I start thinking that perhaps my poems were crap to begin with and it’s just the music that makes them sound good.

Real talent is hidden in a sea of mediocrity. And the problem is that our senses can no longer tell the difference.

Conclusions

I’ve provided a definition of art that tries to reflect what I believe to be our common intuition of what art is: a piece of work for the senses created by a talented person, either original or of great quality. Under that definition, GenAI content is generally not art, because it can be produced by anyone without requiring any special skills or talent. However, it can be argued that GenAI may produce original content, or as original as the human “mental collaging” process can be.

Let me summarize my final thoughts in a few bullet points:

  • The majority of the pretty pictures we see are not art.
  • Art doesn’t need to be original.
  • Original creations aren’t necessarily art.
  • Our senses are increasingly saturated with stimuli.
  • Non-talented people can now create good quality content that looks like art.
  • Real talent will increasingly become hidden in a sea of good quality GenAI content.

This means that digital artists may be replaced by GenAI at places, unfortunately. But on the bright side, I suppose people will appreciate even more the art created with human hands, like paintings and sculptures, or performance arts, like theater or live concerts. Art won’t disappear, and with more people creating content, there will be more chances of finding interesting and original works.

References and links


Concurrency recipes in Swift and C++
Mon, 04 Mar 2024 11:16:24 +0000
Screenshot of the benchmark with different running times
Screenshot of the benchmark with different running times

Introduction

There are many ways to create code that runs in parallel. In Swift in the Apple world it was common to use functions from Grand Central Dispatch (GCD), but in Swift 5.5 async and await became first-class citizens of the language. That means there’s now a more Swift-like way to handle concurrent code.

I personally get confused with the meaning of some of the keywords across languages. For instance, the async keyword in Swift is not the same as the async function in C++ STD library. That’s why I wanted to compare at least Swift to C++.

This article shows different ways of running a loop in parallel in both Swift and C++, and I also compare the running times using a problem from Advent of Code as benchmark.

Advent of Code “Jengatris” as benchmark

I’ve taken the problem from day 22 of Advent of Code 2023 as my benchmark. It’s a problem I like because it’s easy to visualize. In fact, I first solved it using the Godot Engine, just so I could see it in 3D from the start.

Please read the full description of the problem in the AoC website for details. But the blunt summary is that you have to solve a game that it’s like a mixture of Tetris and Jenga, so I refer to it as “Jengatris”. 3-dimensional bricks or “pieces” fall from above until they stack. Then, in the first part of the problem, you have to find out which pieces are “essential”, that is, if such a piece gets removed other above it will fall. If more than one piece sustains another from above, then they are not “essential” (according to my definition of “essential”; of course if you remove both of them, the one above will fall).

In the second part of the problem, which it’s what we are interested in for the benchmark, you have to count how many pieces will fall if you were to remove one of the essential pieces. The answer to the problem is the sum of all the pieces that will fall for all the essential pieces.

Here’s a video made in Godot showing the result:

In the first part of the video I let the pieces fall to find the solution of the first part. Then the screen flashes a bit because there are 974 essential pieces for my input, so I have to quickly simulate 974 scenarios to get the sum. Finally you can see I remove all the essentials, just to see more pieces fall (this is not part of the problem, but just for fun).

Copyable game state

Because we have to simulate each possibility for all the essential pieces, it is very convenient if we can easily create copies of the game state. This will also be very helpful when computing several solutions in parallel, because we want to avoid sharing any data to avoid race conditions.

In Swift we can simply use a struct, because structs in Swift are passed by value. In contrast, classes are passed by reference, so no data gets duplicated. So to be able to copy the whole game state, I simply declared it like this:

  struct GameState {
      var pieces: [AABB<Int>]
      var volume: VoxelVolume<Int>
  }

A game state contains a list of pieces described as integer Axis-Aligned Bounding Boxes (AABBs), and what I called a Voxel Volume, which it’s a 3-dimensional integer matrix that stores the ID of each piece, to know whether that integer coordinate is occupied or not.

Note that in C++ a struct behaves very differently. In C++ a struct is basically the same as a class, except that all its members are public by default. So to be really explicit that I want to duplicate the game state, I added a ‘copy’ function, and explicitly deleted the copy constructor and the copy-assignment operator:

  struct GameState
  {
    std::vector< AABB<int> > pieces;
    std::shared_ptr<VoxelVolume<int> > volume;
    
    GameState(IntAABBList pieces, std::shared_ptr<VoxelVolume<int> > volume);
    std::shared_ptr<GameState> copy() const;
    GameState(const GameState&) = delete;
    GameState& operator=(const GameState&) = delete;
  };

The copy function looks like this:

  std::shared_ptr<Jengatris::GameState> Jengatris::GameState::copy() const
  {
    std::vector<AABB<int> > p(pieces.begin(), pieces.end());
    return std::make_shared<Jengatris::GameState>(p, volume->copy());
  }

Note that the Voxel Volume has a copy function as well. You can see the whole code in Github: algoDeSwift/AoC2023.

We have done the most important thing for concurrency. Now let’s see how to run the simulations in parallel.

Parallel loop in Swift with GCD

The sequential solution in Swift can be written functionally with the ‘reduce’ function (remember that the answer to part 2 of the problem is the sum of all possible simulated scenarios):

  static func countFalls(state: GameState, ids: Set<Int>) -> Int {
      return ids.reduce(0) { sum, i in
          let (_, n) = Jengatris.simulate(start: state, without: i)
          return sum + n
      }
  }

It is quite straightforward to rewrite a loop as a parallel loop with GCD:

  static func concurrentCountFalls(state: GameState, ids: Set<Int>) -> Int {
    let indexArray: [Int] = Array(ids)
    var counts = [Int].init(repeating: 0, count: indexArray.count)
    DispatchQueue.concurrentPerform(iterations: indexArray.count) { iteration in
        let id = indexArray[iteration]
        let (_, n) = Jengatris.simulate(start: state, without: id)
        counts[iteration] = n
    }
    return counts.reduce(0, +)
  }

Notice that we created a shared resource, ‘counts’, where we save all the intermediate results. But we don’t need any mutex for this because we aren’t resizing the array and each thread will only write to its unique position, given by ‘iteration’. The number of threads created will be automatically decided by GCD depending on the number of cores of the CPU and the capabilities of the hardware. In my Mac mini M1 it creates 8 threads.

Parallel loop in Swift with async/await

As I mentioned in the introduction, Swift 5.5 introduced some keywords for asynchronous code: ‘async’ and ‘await’. Note that because the code is asynchronous it doesn’t necessarily mean it runs in parallel. For instance, Javascript is very asynchronous, but it’s mostly single-threaded (unless you are using Workers). During each “tick” of the event loop the different callbacks get updated in Javascript.

But concurrent code is asynchronous by nature, so it is useful to have first-class citizens asynchronous keywords in the language to write concurrent code. You just need to flag a function with ‘async’ to mark it asynchronous. And then you can use the ‘await’ to wait for something to finish before continuing with the rest of the code, without actually blocking the execution of the program.

There’s a special function called ‘withTaskGroup’ that it’s very helpful for our example. It creates a Task Group where you can keep adding Tasks. In our case each task will be one of the simulations. Then, we simply wait for all the results to come back. Here’s the code:

  static func countFallsAsync(state: GameState, ids: Set<Int>) async -> Int {
      var sum = 0
      await withTaskGroup(of: Int.self) { group in
          for id in ids {
              group.addTask {
                  let (_, n) = Jengatris.simulate(start: state, without: id)
                  return n
              }
          }
          for await n in group {
              sum += n
          }
      }
      return sum
  }

Here we don’t have to worry about the number of threads created either. The system will choose for us. The performance should be the same as with GCD.

Parallel loop in C++ with threads

Let’s see how the C++ code compares to Swift. Let’s start for writing down the sequential version of it. Instead of using the ‘reduce’ function, I used a for-loop for this one because I think it’s easier to read:

  size_t Jengatris::countFalls(const GameState& state, const std::unordered_set<int>& ids)
  {
    size_t sum = 0;
    for (const auto& id : ids)
    {
        std::unordered_set<int> moved;
        auto s = state.copy();
        auto _ = Jengatris::simulate(*s, id, &moved);
        sum += moved.size();
    }
    return sum;
  }

A simple solution that works is to create a thread for each element in the input. That creates lots of threads, though. My input has 974 entries, so that’s 974 threads. The code below creates a lambda function with the work each thread needs to do, creates all the threads, and then waits for all of them to finish with ‘join’:

  size_t Jengatris::countFallsThreaded(const GameState &state, const std::unordered_set<int> &ids)
  {
    std::vector<int> idArray(ids.begin(), ids.end());
    std::vector<size_t> counts(ids.size());
    std::vector<std::thread> threads;
    auto parallelWork = [&state, &idArray, &counts](int iteration) {
        int id = idArray[iteration];
        std::unordered_set<int> moved;
        auto s = state.copy();
        auto _ = Jengatris::simulate(*s, id, &moved);
        counts[iteration] = moved.size();
    };
    // this will start MANY threads!! (974 threads for my input)
    for (size_t i = 0; i < idArray.size(); i++)
    {
        threads.emplace_back(parallelWork, i);
    }
    // Wait for threads to finish
    for (auto& thread : threads) {
        thread.join();
    }
    return std::accumulate(counts.begin(), counts.end(), 0);
  }

Parallel loop in C++ with async

In C++ ‘std::async’ is a function template used to run a function asynchronously, potentially in a separate thread which might be part of a thread pool. This is different from flagging a function asynchronous with ‘async’ in Swift.

This can be used in combination with ‘futures’ to wait for results. Javascript programmers may be used to Futures when using Promises. These are the same. In fact, an std::future can also be paired with an std::promise. But here we are interested in the asynchronous function. My code looks like this now:

  size_t Jengatris::countFallsAsync(const GameState &state, const std::unordered_set<int> &ids)
  {
    std::vector<int> idArray(ids.begin(), ids.end());
    std::vector<std::future<size_t>> futures;
    auto parallelWork = [&state, &idArray](int iteration) {
        int id = idArray[iteration];
        std::unordered_set<int> moved;
        auto s = state.copy();
        auto _ = Jengatris::simulate(*s, id, &moved);
        return moved.size();
    };
    // Start asynchronous tasks
    for (size_t i = 0; i < idArray.size(); ++i) {
        futures.push_back( std::async(std::launch::async, parallelWork, i));
    }
    // Wait for tasks to finish and accumulate the results
    // When I put a breakpoint here, Xcode says there are 372 threads.
    // Still lots, but less than 974...
    size_t total = 0;
    for (auto& future : futures) {
        total += future.get();
    }
    return total;
  }

The code is a bit ugly, though. We can do better with Threading Building Blocks (TBB).

Parallel loop in C++ with C++17 and TBB

C++17 does include parallel algorithms, such as transform_reduce. You can pass an execution policy, and if you specify ‘parallel’, it should run in parallel. So my parallel loop can be cleanly written like this:

  size_t Jengatris::countFallsParallel(const GameState &state, const std::unordered_set<int> &ids)
  {
    return std::transform_reduce(
        std::execution::par,
        ids.begin(),
        ids.end(),
        size_t(0),
        std::plus<>(),
        [&state](int id) {
            std::unordered_set<int> moved;
            auto s = state.copy();
            auto _ = Jengatris::simulate(*s, id, &moved);
            return moved.size();
        }
    );
  }

However, that code does not compile with Apple Clang. If you check the compiler support page, parallel algorithms are not supported in Apple Clang. To compile that in macOS, I installed GCC with Homebrew, and TBB (see below). The code compiles and runs. However, it doesn’t seem to run in parallel. The performance is the same as the sequential version. So I rewrote it with Intel TBB.

TBB is a C++ template library developed by Intel for parallel programming. The code got slightly uglier, but here’s the same parallel loop:

  size_t Jengatris::countFallsTBB(const GameState &state, const std::unordered_set<int> &ids)
  {
    std::vector<int> idArray(ids.begin(), ids.end());
    size_t sum = tbb::parallel_reduce(
         tbb::blocked_range<size_t>(0, idArray.size()),
         size_t(0),
         [&](const tbb::blocked_range<size_t>& range, size_t localSum) {
             for (size_t i = range.begin(); i != range.end(); ++i) {
                 int id = idArray[i];
                 std::unordered_set<int> moved;
                 auto s = state.copy();
                 auto _ = Jengatris::simulate(*s, id, &moved);
                 localSum += moved.size();
             }
             return localSum;
         },
         std::plus<>()
     );
     return sum;
  }

For some reason, Homebrew doesn’t automatically find the headers and libraries, so I had to point at them manually. For reference, my compiler command is:

g++-13 -std=c++17 -O3 -Wall -Wextra -pedantic -o advent advent2023-cpp/*.cpp -ltbb -I/opt/homebrew/Cellar/tbb/2021.11.0/include/ -L/opt/homebrew/Cellar/tbb/2021.11.0/lib

Benchmark

I’ve collected some numbers, mostly to verify that indeed the parallel code runs faster. I’ve compiled both Swift and C++ versions in Release mode optimizing for speed, not size, and averaged the values of a few runs. See the graph below.

Running times of Jengatris with different implementations and compilers
Running times of “Jengatris” with different implementations and compilers

A few observations:

  • Any version runs much faster than GDScript, which takes 24 seconds.
  • The C++ code runs a bit faster than Swift.
  • Clang performs slightly faster than GCC.
  • All the parallel implementations perform similarly to each other, in their respective languages/compilers, except for the “GCC parallel” which doesn’t seem to be working as expected.
  • The C++ solution with many threads seems slightly slower than the other C++ solutions in GCC. Presumably because of the overhead of creating that many threads, although it doesn’t seem to make a difference in Clang.

Summary

There are multiple ways to do computations in parallel in both Swift and C++, some approaches more modern than others. However, always prepare some benchmark to actually check that the solution works. You may get a surprise like the one I got with C++17 (if someone knows why the parallel execution is being ignored, please let me know).

In the C++ world, go to the Compiler Support to find out which features of the standard are actually implemented in the compiler you are using. If you are targetting multiple platforms, you may not want to use the parallel algorithms from C++17.

C++ sounds a bit scarier than Swift, but I hope with these comparisons you see C++ doesn’t need to be that much verbose.

Thanks to Eric Wastl for Advent of Code.

All the code can be found in Github: algoDeSwift/AoC2023.


Mantis Shrimp: Image Differences with Metal shaders
Mon, 26 Feb 2024 11:53:27 +0000
Mantis Shrimp image diff tool for macOS
Mantis Shrimp image diff tool for macOS

Image Diffs and Mantis Shrimp

An image diff is an image that visualizes the difference between 2 other images in some manner. There are many image diff tools around, but I often find myself wanting to write my custom difference operator, depending on the thing I'm looking for in the image.

A mentor I had created once an internal tool in Javascript to do image diffs where you could write a snippet of Javascript. It was very useful, but the code ran in the CPU for each pixel, so it was quite slow. Also, it could only deal with the types of images that the browser could handle, that is, usually just 8-bit images in sRGB color space. He called this web app Mantis Shrimp, one of his favorite animals. The reason: mantis shrimps have up to 16 different type of photoreceptor cells. In comparison, humans have just 3 different type of cells (although some people have tetrachromacy). But who needs that many types of photoreceptors when we have technology and software to enhance what we see?

I borrowed that awesome name for my Mantis Shrimp app, although this app can do more than the original. It can compute image diffs of any 2 images that macOS supports, that is, up to 32 bits per color channel, and different types of color spaces. It does it in real time because the operations happen in the GPU, so pixel operations are done in parallel, not sequentially. The app comes with different preset operators, but you can write your own with Metal shaders as well.

Because you can write shaders with it, you can do much more than just image differences. You can even create animations with it, pretty much like what Shader Toy does in the WebGL world.

Here's a 30-second video summary of what Mantis Shrimp can do:

In this article I’m going to give you some details about the actual implementation of Mantis Shrimp.

SwiftUI and Metal

In WWDC2023 Apple announced new functions to modify a Swift view with custom shaders: colorEffect, layerEffect, and distortionEffect. Distortions modify the location of each pixel, whereas the other two modify its location. I assume they must be fragment/pixel shaders. You can find some nice examples in How to add Metal shaders to SwiftUI views using layer effects - a free SwiftUI by Example tutorial.

However, if you want to do something more complex than that and you plan to use SwiftUI, you will need to create a custom UIViewRepresentable. You can find an example of this in the Apple forums: MetalKit in SwiftUI.

For Mantis Shrimp I followed that route, and I encapsulated all the rendering using the Renderer class of my VidEngine, an open-source graphics engine I created a couple of years back. VidEngine uses Swift and Metal, but at the time of writing I haven’t released the changes to make it work with macOS and SwiftUI.

Mantis Shrimp render passes

In VidEngine a “render graph” is simply an ordered list of what I call “plugins”. A plugin encapsulates one or more render passes. A real render graph should be a Directed Acyclic Graph where the nodes are render passes and each node is connected to others through read and write dependencies between the resources they use. One of the best explanations I found of render graphs is in this blog: Rendergraphs and how to implement one.

Because there are only 3 plugins in Mantis Shrimp, the dependencies are hard-coded. One of the plugins is for Mesh Shaders, that I will discuss in a separate article. Most of the time that plugin is disabled. The other two plugins are the DiffPlugin and the OutPlugin.

The DiffPlugin is where the actual operation happens. It consists of a simple vertex shader that draws a full-screen rectangle, and a fragment shader with the per-pixel operation. This fragment shader can be substituted by your own code. Apart from the texture coordinates of each pixel, I pass some other variables such as the time in seconds, so you can create animations. Read the manual for details.

The DiffPlugin writes the output to an image that it’s the same size, the same bit-depth, and the same color space as the first input image. You can only export images as PNG at the moment, but it should preserve its size, its bit depth, and its color space.

What you see on screen, though, it’s what the OutPlugin shows you. Its input is the output of the DiffPlugin, and it adapts it to the current view. By default it uses point sampling, so if your image is a few pixels wide, you should see a pixelated image, not a blurred one (if it were linearly interpolated). This is important because in an image diff tool you want to see the details, not a blurred version of them! The view supports display-P3 color space by default, but the pixel format that gets selected may vary depending on the hardware.

The OutPlugin may also apply the final gamma where necessary. Some pixel formats support the sRGB flag for automatically applying the gamma (or inverse gamma when reading), but it’s not for all pixel formats and its support varies depending on the hardware, so the operation needs to be done in a shader.

A diff fragment shader

A simple difference operator looks like this:

fragment half4 main()
{
    float4 a = texA.sample(sam, frag.uv);
    float4 b = texB.sample(sam, frag.uv);
    float4 diff = abs(a-b);
    float4 out = float4(uni.scale * diff.rgb, a.a);
    return half4(out);
}  

The signature of the function is predefined, and “main” is just a shortcut I’ve defined in Mantis Shrimp, because the function signature can’t be overridden. The actual signature looks like this:

fragment half4 diffFragment(VertexInOut frag [[stage_in]],
   texture2d texA [[ texture(0) ]],
   texture2d texB [[ texture(1) ]],
   sampler sam [[ sampler(0) ]],
   constant Uniforms& uni  [[ buffer(0) ]])

So apart from the fragment texture coordinates, you get two textures, a texture sampler, and some extra variables. The operation above is just subtracting the RGB values of both textures and setting the output to be the absolute value of the difference. Here’s a summary of the different diff presets in Mantis Shrimp:

Mantis Shrimp image diff presets
Mantis Shrimp image diff presets

When no image is assigned, a white texture is sampled by default. That means that the default RGB diff operator acts as a negative if you only assign one image. See the example below.

Negative painting from Iranian-British artist Soheila Sokhanvari. By default Mantis Shrimp will negate the input.
Negative painting from Iranian-British artist Soheila Sokhanvari. By default Mantis Shrimp will negate the input.

A shader sandbox

Mantis Shrimp can also be used to simply test shaders. People familiar with Shader Toy or TwiGL will know that you can create beautiful animations with just a fragment shader.

A common mathematical tool for that purpose is the use of Signed Distance Functions (SDF). An SDF is a function that tells you how far a point is from the surface of the object. When you are inside the object, the distance is negative, hence the “signed”. Because in a fragment shader you get the (u,v) texture coordinate of the output, you can use an SDF to draw simple 2D figures. For instance, a circle centered at (0,0) is just the length of the (u,v) vector minus the radius of the circle.

If you apply transforms to the (u,v) coordinates, you can do more fancy things. One common transformation is to multiply the (u,v) coordinates by a number greater than one and then taking its fractional part, the decimals. In this manner, you will have repeating coordinates that go from 0 to 1, and then from 0 to 1 again. If you use the time variable to change these transforms across time, you can create some interesting animations. Mantis Shrimp comes with this SDF animation preset to get you started:

float sdCircle(float2 p, float r)
{
    return length(p) - r;
}

float2x2 rotationMatrix(float angle)
{
    float s=sin(angle), c=cos(angle);
    return float2x2(
        float2(c, -s),
        float2(s,  c)
    );
}

fragment half4 main()
{
    float t = uni.time;
    float aspect = uni.resolution.x / uni.resolution.y;
    float2 uv0 = frag.uv * 2 - 1;
    uv0.x *= aspect;
    float2x2 r = rotationMatrix(cos(t));
    uv0 = r * uv0;
    float2 uv = fract(2 * uv0) - 0.5;
    float d = sdCircle(uv, 0.5) * exp(-length(uv0));
    float s = uni.scale + 1;
    d = sin(d*s + t) / s;
    d = 0.01 / abs(d);
    float2 uvImage = 0.5 * float2(sin(t) + 1, cos(t) + 1);
    float4 color = texA.sample(sam, uvImage);
    float4 out = float4(d * color.rgb, 1);
    return half4(out);
}

The output looks like this:

SDFs can also be used to represent 3D surfaces. The SDF for a sphere is the same as a circle, but we use the length of an (x,y,z) coordinate instead of a 2D coordinate. Usually these 3D SDFs are combined with a technique called Ray Marching, which consists of casting a ray for every (u,v) coordinate in the screen, starting in the near plane of the camera frustum, and advance the ray along depth based on the value of the SDF. Remember that the SDF tells you the distance to the surface, so you basically know how far you need to move.

There are plenty of resources online to learn about this. Check out Iñigo Quilez home page. He’s the creator of Shader Toy and he has many interesting resources. The important thing for this article is to highlight that you can use Metal shaders in Mantis Shrimp to create this kind of animations (or “demos”) as well. See these ray-marched cubes (not marching cubes!):

Here's the shader code for the cubes example: Genuary 29-sdf-raymarching.metal. You can find other examples I did for the #Genuary challenge in that folder: endavid/Genuary2024.

Beyond fragment shaders

Apart from the fact that the original intent of this app was simply comparing 2 images, it felt strange to allow custom vertex shaders. What would be the point if you can’t change the geometry? I would need a way to upload geometry to Mantis Shrimp. But then, that would be a model viewer, rather than an image diff tool!

However, being able to play with shaders that do more generic things programmatically is still attractive. That’s why I added support for mesh shaders in version 1.1 of Mantis Shrimp. I will discuss this in the next article, but the basic idea is that you have a mesh shader with no geometry at all as input, and you create your own geometry programmatically in the shader. So you can create 3D graphics procedurally, without necessarily using ray marching and SDF functions in a fragment shader. Here’s an example of some cubes generated in a mesh shader: Genuary 10-cubes.metal.

If you use Mantis Shrimp & you like it, please leave me a comment in the App Store. And if you post creations in Twitter or Instagram, use the hashtag #MantisShrimpApp so I can find them 😊

Happy coding!


Troubleshooting “Disk not ejected properly” on a LaCie USB-C HDD
Tue, 20 Feb 2024 17:38:14 +0000
Disk not ejected properly on a Mac Mini
Trying out different cables on a LaCie drive to get rid of ejection errors

Endless “Disk not ejected properly” popups

On January 1st I woke up my Mac Mini M1 from its sleep and I had hundreds of popups saying “Disk not ejected properly”. I had left my external USB-C Lacie Rugged HDD 5 TB drive plugged in, and something had gone horribly wrong.

The first annoying thing is getting rid of the popups. I think there was a popup every 2 minutes. During 12 hours, that’s 360 popups 😣 Because they are system notifications, they are not grouped into a single app, so you can’t batch-dismiss them. After some search, I found this magic CLI command to get rid of them all:

  killall NotificationCenter

But that didn’t make the problem go away.

How to reproduce

If you search for that message, you will find people with tons of different problems. If you call Apple support, they’ll make you go through several standard procedures, but you will need some test case to verify that it works. In my case, the issue can be 100% reproduced with the following:

  1. Plug the USB-C HDD to the USB-C port of the Mac (any of the ports)
  2. Make sure the drive works by reading a file in it.
  3. Go to the Apple menu and set the Mac to Sleep.
  4. Wait 4 minutes.
  5. Make sure the drive turns off (I touch it to make sure it stopped spinning)
  6. Wake up the Mac. At that point, I get one “Disk not ejected properly” popup.

You will need some patience to do tests, but it’s important to have a test case that you can reproduce every time.

Things can get worse

I ignored the issue because it only happened when the computer went to sleep, so I would disconnect the drive when I finished work. In 2020 I had a similar issue, but at the time I solved it by disabling “Put disks to sleep whenever possible” in the macOS power settings. But that option didn’t help this time.

But one day things got worse: I got “Disk not ejected properly” every 2 minutes even while I was working with the disk and the computer was awake. At that point, I called Apple Support (chatted twice, called twice).

Troubleshooting with Apple Support

Unplugging and plugging the drive again didn’t fix the continuous disconnections. After running First Aid on the drive and verifying it’s fine, the Apple support team told me to start the Mac in safe mode, and restart again. That fixed the most pressing problem of the drive disconnecting every 2 minutes, but the issue of the drive disconnecting when the computer goes to sleep persisted.

The other thing they will tell you to do is to create a test admin user and try from that account. The issue persisted for me.

Apple support suspected the cable and the disk, but I wasn’t totally convinced, because why would restarting the Mac fix the most pressing issue, the ejections every 2 minutes? A few weeks later after that first call with Apple, I got that same problem again and I fixed it with a restart. Why unplugging & plugging the drive doesn’t fix it, but a computer restart does? I did suspect the driver, or my Mac (spoiler: I was wrong).

Anyway, let’s remember that the issue wasn’t totally fixed with a restart because when the Mac went to sleep it still caused the abrupt disconnection of the drive.

Seagate (lack of) customer support

Because the Apple engineer suspected the cable that came with my LaCie disk, I tried to contact LaCie. LaCie support is now part of Seagate support. They have a chatbot, and eventually you can chat with a person (or perhaps it wasn’t?) I pasted all my notes to an agent and I got this: “I understand that you are having issue while connecting to computer. Am I correct?” 😅 Didn’t I give enough detail?

After 30 minutes, the only thing I got was a suggestion to run First Aid. I can use Google and ChatGPT, thanks for nothing 😑 They also let me know that they don’t offer support by email to individual users… (In 2017 I had a similar issue with another of their drives, and I did all the interactions by email at the time.)

Battery of tests

I’m going to list down all the things I did to try to isolate the problem. Remember I didn’t know yet what caused it.

  • Connect the HDD to the Mac mini USB-A port with a USB-C to USB-A cable (the one that comes with the PS5 controller). I left the drive plugged all night and it worked perfectly, so one would think that the drive is fine.
  • Connect the HDD with that USB-C to USB-A cable to a Macbook Pro with macOS Catalina. It works perfectly. Unfortunately that Mac doesn’t have a USB-C port so I can’t test the other cable.
  • Connect the HDD to the Mac mini USB-C port with a USB-C cable from an Android phone. With that cable, I don’t get alerts saying that it got disconnected, but all of a sudden the files can’t be read. It seems more dangerous, because it appears connected, but actually the files don’t work. I got “The file xxx could not be opened” when I tried to read any file, and I got some corrupt files while I was writing to the drive. So it’s not really working and it’s super dangerous ⚠️
  • Update from macOS Sonoma 14.2.1 to 14.3. It didn’t fix the issue.
  • I tried plugging the HDD with the USB-C to USB-A adapter with a USB hub connected to the USB-C port and the drive didn't turn on! I remember it working at some point in the past… Any other device works, though, so perhaps there's not enough power for this drive?

After these tests, I headed to the Apple Store and I did more tests with an engineer at the Genius Bar:

  • Using the LaCie USB-C cable, we connected the HDD to one of their Macs, a Macbook Air M2 running Sonoma 14.1.2. Doing the sleep test, the issue reproduced. So it seems unrelated to my Mac mini. It’s either the cable, the disk, or the drivers.
  • Using a Thunderbolt 4 data cable (£75) the error happened again 😞 So it wasn’t a problem exclusive to my cable.
  • Using a 2m Apple 240W thunderbolt charge cable (£29) the error DOES NOT happen 🎊

But why? The more expensive data cable says it’s 100W, so it seemed to be related to the power. For the time being, I got the thunderbolt charge cable. I left it plugged all night and it was working correctly. But I wanted to know what was going on. And did I really want to use that cable? (It turns out that that cable wasn’t ideal.)

Handshaking in a 3rd call with Apple

I called Apple support again. The engineer explained that when you connect the drive to the computer, there’s a handshake that determines whether to use Thunderbolt, or regular USB-C. Even if the port looks the same, Thunderbolt and USB-C are actually different connections. The Mac must think it’s thunderbolt even if I use a regular USB-C cable. And then it’s when it becomes unstable. When it goes to sleep, it tries to keep a register of the connection so it doesn’t have to do another handshake, or spark a new connection again. Then, when it wakes up it sends data as it was Thunderbolt, when it’s not.

That’s the explanation I got and I was suggested to keep using the 240W thunderbolt charge cable. That doesn’t seem to explain why it didn’t work with the Thunderbolt 4 cable, though.

Apple told me to speak to the manufacturer.

Speed tests with Seagate support

I explained the whole thing to Seagate customer support again, and I got this reply:

“Upon checking the issue is from cable not from drive. According to your statement, you have tried with different PC and different cable. You will get the pop up like disk not ejected properly except this 240W thunderbolt charge cable. Kindly use your drive with 240W thunderbolt charge cable. Unfortunately your drive is supported only with this cable - 240W thunderbolt cable.”

I asked why do they sell the drive with a cable that doesn’t work, then, but I got no reply.

The chat is quite surreal and sometimes I wonder whether I am speaking to a human. Except if it were ChatGPT the grammar would be correct, so I do believe they are human. After 1 hour 40 minutes in the chat, I managed to speak to a more helpful person who asked me to do some speed tests.

If you are doing this, I recommend using a big file to get consistent results. I zipped my local Movies folder, and that gave me a 1GB ZIP file. To test the copying speed from the Mac mini SSD drive to the external drive, I used rsync. You can use it like this (“LaCie 5TB” is the name of my HDD):

  rsync -ah --progress ~/Movies.zip /Volumes/LaCie\ 5TB

I tried with the Apple cable I got in the Apple store, and with the original LaCie cable:

  • With 240W thunderbolt charge cable: 36.87 Mbytes/sec
  • With USB-C to USB-A adapter cable (for reference): 38.21 Mbytes/sec
  • With USB-C cable from LaCie: 123.62 Mbytes/sec

So it’s much slower with the charge cable, as slow as using USB-A 😢 But I couldn’t reliably use the LaCie cable. Or the £75 Thunderbolt 4 data transfer cable. Seagate says the drive is compatible with both Thunderbolt 4 and USB-C. The Mac Mini M1 port is Thunderbolt 3. Seagate escalated this issue and they told me to wait 24 hours.

Perhaps the HDD was needing more power than it should and that’s why it wants the 240W cable? Or perhaps the operating system/driver doesn’t know how much power it needs to give to the HDD? Shouldn’t the driver somehow detect this? And the macOS UI certainly could have a better way to close those 360 popups…

In the Seagate help page on this issue they say many users have reported this “after updating the macOS”, so I thought perhaps it could be Sonoma-related. Read “Disk not ejected properly on Mac”.

I decided to speak again with Apple Support.

4th call with Apple Support: USB 2.0 & HFS+

In this last call with Apple I shared my screen and they helped me troubleshoot and install some things:

  • We installed the LaCie tool kit to see if they provide firmware updates, but there aren’t any. See LaCie Downloads and Firmware downloads.
  • They recommended me to use the Blackmagic Disk Speed Test app for testing the disk speed.
  • They asked me to format the drive again with GUID Partition Map, and HFS+ (Mac OS Extended, Journaled), with the Disk Utility in macOS Sonoma. I was using APFS, but that’s only recommended for SSD drives, and there is no advantage using it in HDD drives.

We didn’t manage to fix the problem, but I got faster transfers after the formatting. I was getting about 85 Mbytes/sec with APFS (using the Disk Speed Test app; my rsync test was faster), but something around 130 Mbytes/sec with HFS+. See below.

Speed tests of my HDD LaCie drive using different cables and file systems
Speed tests of my HDD LaCie drive using different cables and file systems

The other thing I learned is that the 240W charging cable is USB 2.0, not USB 3.0. That’s why it’s slower. In contrast, the Thunderbolt 4 data cable supports USB 3.0 up to 10 Gbits/second, and it also supports USB 4.0, which it’s 40 Gbits/sec.

So the drive worked when I used USB 2.0, in either the USB-A or the USB-C port, but started failing otherwise. I had to speak to Seagate again.

Drive Replacement

After another support chat session with Seagate, they told me there must be something wrong with the drive. Since it was still under the 2-year warranty, they asked me to send it back to them, to an address in the UK (tracked, about £8 in the UK).

A week later I got a new drive.

I did all the tests several times. I also left the computer sleeping for 30 minutes to be really sure. The problem was gone 🥳

Just some notes, in case someone uses the same drive:

  • I ran the LaCie Setup that comes preinstalled in the drive, and the drive got formatted with GUID Partition Map and HFS+, which matches Apple's suggestion 👍
  • The Setup app redirects you to lyvetst.seagate.com/?sn=… to register the drive, but that URL does not exist. I edited the link to point to lyveint.seagate.com instead, and that worked.
  • Disk Speed Test says the speed is: Write 141.2 MB/s, Read 135.5 MB/s 👍

Happy drive now 🥳

Summary

Debugging hardware issues is really a pain. Sometimes it’s easier to live with the faulty drive than trying to get to the bottom of it.

However, I must say that Apple support was very helpful and that I learned lots of things in the process. I can’t say the same from Seagate, because we still haven’t learned what piece of their hardware would cause such a strange behaviour. They sent me a working refurbished drive in the end, but I spent a lot of time and some money to resolve an issue that it was their fault to begin with.

In any case, I hope this article helps other people struggling with the infamous “Disk not ejected properly” troubleshoot the issue and figure out whether it’s a problem caused by the drive itself or not.

Thank you very much to the people in Apple support and Seagate support that gave me all the information I’ve mentioned in this article.

References


My retrospective of 2023
Sun, 21 Jan 2024 21:47:48 +0000
Happy new year & a visual summary of my 2023
Happy new year 2024 🐲 & a visual summary of my 2023

My 2023 as a story

2023 has been another year of ups and downs. The worst happened at the end of the year, when I lost my job. I’ve already blogged about it in “8+ years programming for fashion”. The best of 2023 happened in June, when I got married 👨‍❤️‍👨. But many other good things happened as well.

I’ve been quite busy with job interviews this month, and it feels I’m lagging behind lots of things. I finished Advent of Code in January, and I’m 12 days behind in Genuary. But I guess it’s good to be busy. But I didn’t want January to end without having written a retrospective of 2023. I think it’s always a good exercise to stop and look back, specially to celebrate the achievements. Otherwise it’s easy to get lost in the worries of the present, as if everything is always dark.

Speaking of darkness, I can’t believe last year we spent the new year in Tel Aviv, and this year there’s war going on there… It seems the number of ongoing wars only keeps increasing… ☹️ Let’s go back to positive things.

Apart from our wedding, another highlight of my 2023 were the tripS (plural) to Japan. After almost 5 years, I managed to go to Japan and renew my residence card. I was quite stressed about that since the beginning of the pandemic, so I’m so happy I was able to solve that.

In the first trip, I went for 3 weeks, in February & March. I took some days off, but I also worked remotely. I woke up at 4am, worked from 6am to 10am, then went out & enjoy daylight, and then worked again from 4pm to 10pm. That time in the evening I could have meetings with the team in the UK. I felt a bit tired, but I wasn’t sleepy. I think it was all so exciting that it felt quite nice to make the most of my day.

One random day I saw a random poster of a Kabuki play of Final Fantasy X. Kabuki is a traditional all-male theater, so at first I thought it was quite funny to see the male actors “cosplaying” as some of the female leads. But then I thought this was a very unique opportunity, and I was lucky enough to be there at the time, since it was shown only for a month and a half. So I bought a ticket, I took an extra day off, and I went to see it. It was very long, from 12pm to 9pm, but it was really worth it. One of the best things I’ve ever seen. I was very moved. I wrote about it in my “otaku” blog — in Spanish, though: Final Fantasy X Kabuki.

I visited Japan again in May. It was a very short trip, just 3 nights, tagging along my partner’s work. But our flight back to London was cancelled so we stayed 2 extra nights. Perfect for me 😂 I had brought my work laptop, and there was a WeWork office next to the hotel, so I went there one day. I was able to meet some friends I couldn’t see in March. Again, I felt I made most of my time there. I was very tired after a very long flight, but I arrived in the morning and I directly went to Akihabara, met a friend, he showed me some nice interesting retro places, I did some shopping, then we went back to Yokohama, and there we had a really good time in a local sushi restaurant. I uploaded a YouTube video (Spanish again, but with English subtitles) with the talk we had in that restaurant, talking about my time in the Japanese game industry: Gossip: game devs in Japan, with Ahgueo & Fibroluminique.

As I said earlier, I got married/civil-partnered in June. Although nothing has changed (we’ve been 14 years together), it was nice to have some friends & family over to celebrate. It was a small ceremony, but we all had a great time. My brother came with his family, and I spent a few fun days with them in London.

In September, we visited Barcelona and we had another celebration there with friends and family, in a Catalan masia. And then my sister visited us in November for my birthday and we saw the early Christmas lights in London & ate lots of delicious Asian food & Japanese desserts. For Xmas, we went to Barcelona again and we had a lovely time there. We usually avoid Xmas, because flights are always very expensive and very crowded, so it was nice to spend Xmas at home after a while.

Time to rewind and list the achievements.

Work

Lots of Generative AI this year. I thought we were doing quite well, but stuff happens… Read summary in here: “8+ years programming for fashion”.

Indie development

I’m still demotivated with game development. I have some games I wanted to build in my backlog, but it feels a bit pointless if no one is going to play them… But I did release some things:

  • I released Syllabits for the browser, Windows and Mac. It’s in itch.io. And almost every Friday evening we’ve been playing it at work, in the Practice mode. I’m glad they liked it in my office! 🥹
  • In summer I released maintenance updates for all my iOS games. No new features.
  • In December I released a new macOS app called Mantis Shrimp. It was listed 5th in the Top Paid macOS Developer Tools this month! Although only briefly. It was something I was making on the side and that I used at work to compare images. I’ve been doing the Genuary challenge with it, and it’s fun. See my #Genuary1 in Instagram.

Blogging

I didn’t write any technical articles in 2023 😓 Well, I did write one article for my company, but it didn’t get the green light, so it will remain unpublished.

In the Spanish geek blog, the only thing interesting is the article about the FFX New Kabuki I mentioned earlier.

Vlogging

This year I decided to spend more time gaming, and I streamed most of my plays on Twitch. I did upload some YouTube videos as well, but I intended to do more technical ones, and I failed to deliver. All my videos are usually in Spanish, but I’ve added some English subtitles on request.

  • I did a couple of rants on ChatGPT, and WWDC23 in my “serious” channel: @endavidg.
  • I uploaded a couple of solutions to Advent of Code 2022 (but not 2023 yet…) in my channel on Swift programming: @algoDeSwift.
  • I did upload lots of videos to @focotaku 🙈 But it was mainly an edit of my play-through of Final Fantasy XIII from 2022. I thought it was good for learning because I was playing it in Japanese and I wrote some language notes in each one of the video descriptions.
  • I did upload some other types of videos to @focotaku, like book reviews of Japanese novels, and some other game reviews. But my most viewed video is one a bit off from the “otaku” culture. It’s a summary I did about the BBC documentary on the scandal involving Johnny Kitagawa (it has English subtitles), the deceased top figure in Japanese music industry / male model agency. The video has 850 views --perhaps not much, but quite a record for me. I’m glad it’s been informative. I did 3 videos on the topic and they are all quite visited.

Learning

  • More Python & some C++ at work
  • I did all the Advent of Code 2023 in Swift, and I did one of the problems in Godot 4 as well, just for the fun of visualizing it: AoC 2023 Day 22 on Instagram.
  • I practiced some Metal again to create Mantis Shrimp.

Leisure

I already mentioned several trips in the first section. I wanted to catch up with some JRPG titles, so I did play & cleared these games:

  • Nier Replicant — I enjoyed it very much. The soundtrack is amazing. There’s are some concerts in Europe in February, including Barcelona, but I failed to get tickets… 😢
  • Nier Automata — amazing as well.
  • Final Fantasy XVI — Epic! It was a present from a friend, and then I also got a PS5 as a present! 🤩 I’m actually replaying it because I enjoyed it very much.

I wasn’t planning to read any novels, but I ended up reading 3 novels in Japanese by the same author, Keigo Higashino. My partner has lots of his novels at home and I got curious. I was surprised I could read them relatively easily. When I tried reading novels in Japanese in the past, it has been usually hard. But the difficult novels were SF or JRPG-related, so the vocabulary may have been weird… I actually read another novel in Japanese, “Egoist”, an LGBT story 🏳️‍🌈, and started another one about NieR, so it’s my record on Japanese novels reading so far! I also read a novel in English called “Tomorrow and Tomorrow and Tomorrow”, about game development. It was really moving. See my list of readings.

Wishes for 2024

Again, let’s hope 2024 brings peace ☮️🕊

I haven’t written down my personal goals yet, but I hope I can write some technical blog posts this year. I have one planned about Metal and Mesh shaders, related to the development of Mantis Shrimp.

I also want to upload some videos about Swift development.

I started the year by joining a badminton group 🏸, so hopefully I do more exercise this year!

And I hope to visit Japan again some time this year.

Best wishes to everyone. My dragon-themed new year greetings card at the top of this article (I was also lagging behind on my new year greetings… I started drawing them already in January…)

Again, Happy New Year 2024 🐲


8+ years programming for fashion
Mon, 01 Jan 2024 12:37:55 +0000
A visual summary of some of the stuff I've worked on at Metail
A visual summary of some of the stuff I've worked on at Metail

A retrospective

My time at Metail has come to an end, and I thought it was a good time to write a retrospective longer than usual. It’s been an interesting ride, so I think it’s worth to look back and review the things I learned.

8 years and a half is the longest I’ve ever been in a company. The main reason for staying this long has been the incredibly welcoming atmosphere and the friendly people that worked there, but there were also very interesting technical challenges that aligned well with my technical background. Let me give you a bit more detail.

How I ended up here

I’m a graphics programmer and before Metail I’ve always worked in the games industry. The games industry is full of amazing people, and very passionate as well. I suppose passion can sometimes transform into angry faces and needless shouting. The year I left the games industry I wasn’t going through the best of times. My mum was fighting cancer, and then she got a brain stroke. So it was hard coping with the stress of work. My counsellor suggested that a change may be good.

I got a random phone call from a recruiter and they talked about Metail. I wasn’t very interested in fashion at the time, but the technology that they described sounded quite interesting. They needed someone with knowledge in Computer Graphics, but also with a good understanding of Computer Vision and Image Processing. Although professionally I had mostly worked on graphics and optimization, my PhD was on Computer Vision and Image Processing. So this sounded like a nice combo! I passed the interviews and I got in.

The MeModel era

The main product when I entered Metail was called the MeModel (see figure on top). It was a virtual try-on system where users would enter their measurements to generate a virtual avatar, and then they could try on clothes. It was a web application that retailers could install in their websites. The technology was a mixture of 2D, photographs of clothes and faces, and 3D for the body shapes. The garment physics were done in 2D.

The technology I was maintaining was a server-side renderer written in DirectX 10, C#, and C++. After getting familiar with the pipeline and the asset publishing, I started by optimizing the performance by removing redundant textures and unnecessary processing. Sometimes graphics code becomes spaghetti 🍝, but a simple PIX GPU frame capture can reveal very interesting things quite easily.

I also worked on improving the visuals. I introduced a new avatar with more joints and I contacted an ex-colleague to help us author more poses. I changed the skin shaders, and I wrote a WebGL tool to help us tweak the skin to match the photographic heads (see Skin colour authoring using WebGL).

I also did some server-side work. Because I had some previous experience with NodeJS, I suggested building a small server in NodeJS for scaling and monitoring. This sat on top of AWS services, but it let us do more complex logic suited to our renderer. The new bottleneck was the starting time of the renderer service —it took several minutes to boot. I looked at some old spaghetti code and rewrote it into a math paper, and then rewrote the whole thing with simpler matrix multiplications. Also, I turned most of the asset loading into lazy initializations, for a final starting time under 2 seconds.

I built several internal visualization tools, and other tools for the outsourcing teams to help them see what they were creating, for faster iterations (from days to hours). I also became Engineering Manager of a team of 7 and I mentored other developers. I did lots of interesting things.

Unfortunately, the MeModel didn’t quite take off and the company struggled financially until we were acquired by one of our investors.

The EcoShot era

A scan of myself, my scanatar superimposed on a photograph, and a couple of EcoShot renders
From left to right: a scan of myself, my scanatar superimposed on a photograph, and a couple of EcoShot renders

When we shut down the MeModel service I was working on an idea from our CTO. He thought that in order to strive for realism, we needed to do the garment simulation in 3D. We were experimenting with some 3D CAD cloth authoring software at the time, and I thought it would be relatively simple to reuse all the technology we had to create something for that software.

Unfortunately, all the client-side developers had to go. So I had to build everything on my own. But the CAD software let you write plugins in Python, so it wasn’t quick to get started. I like C++, but Python let us build things faster in this scenario.

I started by getting a body scan of myself and using our software to automatically rig it, add some poses, and import it into the software. That's what we call a “scanatar”, i.e. an avatar originated from a scan. When I saw the draping of a single garment in different sizes on an accurate model of my body, I thought this would be a game changer.

I built a beta of the software in a couple of months, all self-contained —there was no service at the time. After the beta, I worked with the network architect to build a service. I built a renderer that used V-Ray to render the garments using raytracing. For the 2D composition I used mainly ImageMagick, and some OpenCV scripts written by our R&D team.

Apart from EcoShot, we worked on other projects, such as the European eTryOn project (see my XR4Fashion talk from 2:00:00, From 3D designs to Lens Studio: Challenges in faithful garment representation), and some other AR collaborations with Snap (see me wearing a virtual Puma tracksuit in the figure on top, using one of my Snapchat filters). So I got to touch some game engines as well, like Unity or Lumberyard, and Lens Studio (see some mentions in Reasons for a solo dev to love Godot Engine).

2023 has been an interesting year as well with the boom of Generative AI (GenAI for short). I worked on releasing new features and new GenAI avatars for the EcoShot plugin at a very fast pace. Many customers were impressed by the results, and we've been getting requests for new imagery.

The End & The Future

Unfortunately, we ran out of time. EcoShot will continue to exist in the hands of Tronog (see the announcement: Metail and Tronog enter into a strategic partnership to transfer EcoShot and make AI-generated fashion accessible to all). By the way, can you tell which models are real and which are GenAI?

Image from Metail website showing some EcoShot & GenAI models
Image from Metail website showing some EcoShot & GenAI models

There are still many exciting things to come for EcoShot in 2024, but I will be moving on. At the time of writing, I don’t know yet where to, though. It seems it was still early for many apparel companies to adopt 3D, so I may not be working again in fashion. Who knows.

I was attracted to the idea of doing something good for the planet. The fashion industry, specially fast fashion, is a machinery of creating waste. Creating virtual samples before they are manufactured should help reduce some waste. Also, showing customers how the garment fits in different body shapes should help reduce returns. But these technologies still have a slow adoption. I hope that GenAI will revolutionize that.

While I look for my next adventure, I will be working on some side projects. I recently released an image diff app called Mantis Shrimp 🦐. I borrowed the name from a Javascript web tool made by my team lead when I entered Metail. He loves Mantis Shrimps because of the 16 or so photoreceptor cells in their eyes. I thought it was a nice way of coming back full circle.

So long and thanks for all the fish 🐋🌈

Happy New Year 2024 🐲


⏪ Previous year | Next year ⏩