duhan
Blog

Skill Issue: Filtering an XML File with at Least 2 Conditions

I feel you, you are a developer and can’t even filter your XML files with at least 2 conditions. You are not alone. I was in the same boat. But I have built a tool for this.

GitHub: https://github.com/duhanmeric/feed-filter

Before telling the story, I need you to understand what a feed is. A feed is a structured data file, often in XML format, that contains detailed information about products in an online store’s inventory. This file typically includes product attributes such as ID, name, description, price, availability, image URL, and more. It’s so useful for sharing your product data with other marketplaces/advertising platforms/shopping channels. Usually, it consists of 3 nested tags. Looks like this:

<rss>
  <channel>
    <item>
      <id>9778</id>
    </item>
    <!-- more items -->
  </channel>
</rss>

<item> itself has it’s own product attributes. These attribute names may vary. So, you can’t predict the tag names and attribute names beforehand.

Why I started this project?

A few weeks ago a task came into my inbox. The customer was saying: “You are rendering 14 products from category X and price above Y, but it should be 16. Where are the other 2 products?” The answer wasn’t a clear-cut. They have ~8000 products in their feed so I needed to use 3rd party tools to filter that. But it was an emotional drain to use 3rd party tools with a huge XML file — it was a laggy experience. Quickly, I wrote a script with the help of ChatGPT. Downsides of this script were:

  • ‌You need to hardcode your <items> path
  • You need to hardcode your desired conditions

Even so, it helped me to solve my task that day. I have changed the script a little bit in order to apply the filters to the feed file and ta da, success! But I knew the fact that these type of tasks will come and go once in a while and as a team we are going to waste our time to apply filters to the feed file directly. Thus, I started to think about the feed-filter project.

Goal

My goal was to filter the products with one or more filters without the need for the user to know the content of the feed, where only the URL of the XML file is entered as input. Because of that, user also doesn’t need to know the tag names and attribute names. The project should be able to filter the products with the given conditions and return the filtered products in the UI.

I wasn’t sure about the feasibility of the project. I needed to see if this projects is possible with these requirements — I needed to make an experiement. I chose best tool for this task. The one I know very well: React (Next.js actually) — it was a mistake.

Algorithm

Basic iteration of the algorithm is:

  • User enters the URL of the XML file
  • It fetches and parses the XML file
  • Creates a directory inside of a predetermined directory (src/uploadedFiles/crypto.randomUUID) and saves the XML file in it with the same name of the directory
  • It generates the available filters (<item> attributes) and creates a .json file in the same directory above. It then renders keys in the UI
  • User selects the filters and applies them
  • It filters the products and saves them in a new .json file with a name of total_crypto.randomUUID.json and returns the filtered products in the UI with pagination

Here how it works:


Why not in Production?

I decided not to push into production because of the limits of serverless environment. Thanks to serverless environments (vercel), reading/writing a file from fs is limited. I could dockerize the app and deploy it to a server but effort would be too much for this simple app. So, I decided to keep it as a local development app.


Why not XPath?

I know that XPath is a powerful tool for filtering XML files. But I didn’t want to use it because of the reasons below:

  • XPath is not a user-friendly language. It’s hard to understand and write.
  • It requires namespaces to be defined. It’s a pain to define namespaces beforehand. Because I assume that any XML file can be uploaded. i.e you can’t query google feeds without defining the namespace.

I know the fact that if I were to use XPath, I would make the whole process smoother and faster and plus, I would only need to download and save the xml so I would not need to save .json files.


GitHub: https://github.com/duhanmeric/feed-filter