Stefan Tilkov's Random Stuff

QCon SF 2009: Don Box & Amanda Laucher, Codename "M": Language, Data, and Modeling, Oh My!

These are my unedited notes from Don Box & Amanda Laucher's talk about Codename "M": Language, Data, and Modeling, Oh My!

  • [Don has promised not to tweet. That's a good start.]
  • Interesting similarities btw/ M and MPS
  • M is a language for data - one of the most interesting places of data is a sequence of Unicode code points
  • Great support for text processing perceived as critical
  • Example: extract some data from text - tweets
  • Intellipad - default environment for writing grammar files, this is where tool support shows up first, then in Visual Studio
  • M code for a language definition (a function from text to something else, declaratively specified):

    module QCon
    {
        language Simple
        {
            syntax Main = empty;
        }
    }
    
  • Main is the rule that is the entry point

  • Open file in three-pane (or rather, four-pane) mode
  • Show source file, grammar, show output - empty file yields Main []
  • Amanda makes a great assistant ;-)
  • Non-empty file produces errors
  • Change "empty" to "any" - file with a single character works, more than that produces errors; change to any+, validates again
  • Simple language for interpreting tweets:

    module QCon
    {
        language Simple
        {
            syntax Main = Tweet;
            syntax Tweet = Content*;
            syntax Content
                = RawText
                | Handle
                | HashTag;
            token RawText = (any - ("#"|"@"))+
            token Handle = "@" Name;
            token HashTag = "#" Name;
            token Name = (any - ("@|"#"|" ""))+;
        }
    }
    
  • regular expressions at the token layer, context-free grammar at the syntax level

  • Amanda: "Only crap languages make you define something before you have to use it"
  • Discussion betwen Josh, Amanda, Don about whether or not the grammar is correct
  • Good point about interactive grammar development using the three-pane editor
  • Pattern names used to display data in a structured way
  • Adding @Classification["Keyword"] syntax-colors the source
  • M is structurally typed
  • M consists of lists and records
  • Generating the right-hand side:
  • Intellipad crashes! [Boom!] :-)
  • The spec for M is licensed under the Microsoft OSP (which makes people as happy as they can be working with Microsoft)
  • Javascript-based implementation of parts of the language; subset of the three-pane mode
  • Toolchain is written in C#, parser written in M, more and more compilers written in M
  • Intellipad crashes! Again! [Boom!] :-)

    module QCon
    {
        language Simple
        {
            syntax Main = v:Tweet => v;
            syntax Tweet = v:Content* => v;
            syntax Content
                = v:RawText => v
                | v:Handle => v
                | v:HashTag => v;
            token RawText = (any - ("#"|"@"))+
            token Handle = "@" v:Name => v;
            token HashTag = "#" v:Name => v;
            token Name = (any - ("@|"#"|" ""))+;
        }
    }
    
  • bubbles up the actual content, result is just a list of the strings extracted

    module QCon
    {
        language Simple
        {
            syntax Main = v:Tweet => v;
            syntax Tweet = v:Content* => v;
            syntax Content
                = v:RawText => v
                | v:Handle => v
                | v:HashTag => v;
            token RawText = v:(any - ("#"|"@"))+ => { Kind => "RawText", Text => v }
            token Handle = "@" v:Name => { Kind => "Handle", Name => v };
            token HashTag = "#" v:Name => { Kind => "HashTag", Topic => v };
            token Name = (any - ("@|"#"|" ""))+;
        }
    }
    
  • Question from Josh: Isn't this mixing lexing and production rules? Don: Goal is to have no difference, but it can be pulled out

  • Next: Consuming stuff
  • Using TDD to Don's tweets
  • VS project includes language grammar defined earlier
  • Amanda: Can one debug a grammar? Don: Answered later
  • M runtime can be hosted inside a C# program
  • Recently stopped internally to use .NET 3.5/VS 2008, now exclusively on .NET 4 and VS 2010
  • Language defintion is included in the program binary
  • Showing off some

    var language = Language.load(/* MImage */ typeof(Program).Assembly, "QCon", "Simple"); // yields runtime that can parse a program
    dynamic result = language.ParseString(input);
    bool hasHashTag = false;
    foreach (var content in result)
    {
        hasHashTag = content.Kind == "HashTag";
        if (hasHashTag)
            break;
    }
    AssertEqual(true, hashHashTag);
    
  • Demo is actually working

  • Change AssertEqual to Assert.IsTrue to get around exception thrown
  • Question from Ian Robinson: Q. Can LINQ be used to walk the result? A. Not yet, as dyamic and LINQ don't mix yet
  • Not going to write any more C# types, has written all that are in him ;-)
  • "This will fail!"

    var language = Language.load(/* MImage */ typeof(Program).Assembly, "QCon", "Simple"); // yields runtime that can parse a program
    dynamic result = language.ParseString(input);
    var query = (from content in ((IEnumerable)result).OfType<dynamic>()
                where content.Kind == "HashTag"
                select content).Any();
    Assert.IsTrue(query);
    
  • It failed indeed.

  • "This talk is not about integration of two features I don't work on"
  • M is optionally typed, structurally typed
  • optional typing: syntax Bob = any* : Integer32 => 42;
  • only partially plumbed in the current version
  • Example for structural typing (not really working yet):

    module Ola 
    {
        type HashTagRec = 
        { 
            Kind : Text where value == "HashTag";
            Topic : Text;
        }
    }
    
  • Rudimentary grammar debugging: set breakpoints in input source text; 4th pane shows up, shows matching stack

  • Syntactical and semantical editing support
  • Semantics relies on hooks that are not there yet
  • M is built using an M Grammar
  • Language completion for M is built using C#
  • Ambiguators for GLR need to be written in C#
  • One of the metrics used to evaluate the language: XML. There is a grammar for XML, briefly demoed.
  • MPS guys have a different religion – Microsoft believes people have a text editor they love
  • <?magnum PI?> :-)
  • Comment from Martin Fowler: Main difference to classic tools such as ANTLR is the dynamic - no code generation needed
  • Comment from Ola Bini: Didn't see the type-checking and debugging before – now he's impressed